I have a information retrieval assignment where I have to use elasticSearch to generate some indexing/ranking. I was able to download elasticSearch and it's now running on http://localhost:9200/
but how do I read every documents stored in my folder called ' data
'?
Elasticsearch is just a search engine. In order to get your docs and files searchable, you need to load them, extract all relevant data and load into elasticsearch.
Apache Tika is a solution for extracting the data out of the files. Write a file system crawler using Tika. Then use the Rest API to index the data.
If you don't want to reinvent the wheel, have a look on the FSCrawler project. Here is a blogpost describing how to solve a task you are facing.
Good luck!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.