简体   繁体   中英

Indexing plain text files in Solr

Having the problem to find the proper well structured manual and information how to do the indexing for plain text in Solr (.txt).

I got the point how to work with the Solr standard data types, like .xml or .json but until now have not at least one structured and fully described manual for plain text indexing (especially if your file does not contain ids and there is only words and spaces).

Looking forward to receive some sources that can help me with this problem or some code examples which can be helpful for doing this.

You should still be able to use the extract endpoint (which uses Apache Tika in the background). You can provide field values through the query string as seen in the example for the techproducts data set :

/solr/techproducts/update/extract?literal.id=doc1&commit=true

The literal.id=doc1 parameter gives an actual value for the field that can't be extracted from the dataset submitted.

Make sure to set the Content-Type header to text/plain when you're submitting (unless you're submitting as a regular html form upload).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM