简体   繁体   中英

Building a solr index using large text file

I have a large text file in following format:

00001,234234|234|235|7345
00005,788|298|234|735

You can treat values prior to , as keys and what I want to do is quick and dirty approach to query these keys and find the results sets for each key. After reading a bit I found out that solr provide a good framework to do this.

  • What would be the starting point?
  • Can I use python to read the file and build this index (search engine) using solr?
  • is there a different mechanism to do such?

You can definitely do that using pysolr which is a python library. If the data is in key value form you can read it in python like shown here : https://pypi.python.org/pypi/pysolr/3.1.0

To have more control on search you need to modify the schema.xml file to have the keys as you have in your text file.

Once you have the data ingested in SOLR you can follow the above link to perform search.

You can index your data directly in Solr using the UpdateCSV handler: You just need to specify the destination field names in the fieldnames parameter in your curl call (or add them as the first line in your file if that is easier). No custom code needed.

Do remember to check that the destination field for the | -separated values splits into tokens using that characters.

See https://wiki.apache.org/solr/UpdateCSV for details.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM