a blog for those who code

Sunday, 2 September 2018

How to Index Documents in SOLR

Solr is an Open-Source Search Platform built on Lucene which makes searching fairly easy. In this tutorial we will see how we can Index documents in SOLR. Indexing means adding content to the SOLR index which can be easily searchable.

The advantage of using SOLR is that, it can accept the data from any sources be it JSON file, XML or even database. Whatever be the source of data, one thing is common among them is how to fed the data to the SOLR which is nothing but as a document. A document contains multiple fields having name and content.

Since I mainly work with Rational Database, I use DataImportHandler for getting the data out of the database. The main steps to follow while indexing the documents are

  • Read the data
  • Create SOLR documents
  • Differentiate between Full and Partial Update (Inserting and Updating).

DataImportHandler

Now the next step is to configure the DataImportHandler in the SOLRConfig.xml file and you need to provide the location of data-config file which has the code to fetch the data, read the data and process the data to create SOLR document.

<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler">
  <lst name="defaults">
    <str name="config">pathtoconfigfile.xml</str>
  </lst>
</requestHandler>

The Sample Data-Config File looks like below :
<dataConfig>
    <dataSource driver="org.hsqldb.jdbcDriver" url="URL of Data Source" user="sa" />
    <document>
        <entity>
             //the entity which you would like to import
        </entity>
    </document>
</dataConfig>

The commands used by DataImportHandler are :

  • Abort - To abort indexing the documents.
  • Full-Import - It will import all the documents from the database. Note that querying is not blocked while full-import is going on. Full import usually takes some time depending on the number of documents.
  • Delta-Import - It imports only the new documents or any changes happened to the previous documents.
  • Status - Shows the number of documents created, updated or deleted.


Please Like and Share CodingDefined.com blog, if you find it interesting and helpful.

No comments:

Post a comment