简体   繁体   中英

Apache Solr index a folder (and underfolders)

I've googled a lot and I haven't found a good solution yet.

I want to index a folder who has a lot of files and underfolders. But I don't get it how to index it, I think there has to be a path or so anywhere in the config, but I haven't found one. Please don't roast me I'm new with solr. ;)

Try Post tool with -Drecursive param

lets say folder(test) inside two csv files and one subfolder(test2) inside it few more csv files. post tool recursively check for all files inside folder test and its subfolder test2 for indexing.

java -Dtype=text/csv -Dc=collection1 -Drecursive -jar post.jar test

-Dauto param will index all file types that tika can process.

java -DDauto -Dc=collection1 -Drecursive -jar post.jar test

FileListEntity processor can be used for indexing file paths. Details can be found in https://wiki.apache.org/solr/DataImportHandler#FileListEntityProcessor

perfect, thank you guys indexing worked!

But I saw, that solr is not that what I need. I need a Document/Index-Server in which I can index my shared folders with around 4GB data and I need an userfriendly search gui... Solr is not really like that.

1) create a core in configset

cd $solr_home

cd server/solr/configsets

mkdir download_search

cp -r _default/. download_search

# create a solr core with default configs

curl -X GET 'http://localhost:8983/solr/admin/cores?action=CREATE&name=download_search&instanceDir=configsets/download_search'

# get current schema fields

curl -X GET "http://localhost:8983/solr/download_search/schema/fields"

2) create schema.xml file and add fields of csv/JSON to it

    <field name="Gender" type="string" indexed="true" stored="true" />
    <field name="User ID" type="string" stored="true" indexed="true" />
    <field name="Age" type="int" stored="true" indexed="true" />
    <field name = "EstimatedSalary" type = "float" stored = "true" indexed = "true" />
    <field name="Purchased" type="int" indexed="false" stored="true" multiValued="true"/>

 <copyField source="Gender" dest="Gender_str"/>
 <copyField source="Purchased" dest="Purchased_str"/>
 <copyField source="Age" dest="Age_str"/>
 <copyField source="EstimatedSalary" dest="EstimatedSalary_str"/>
 <copyField source="User ID" dest="User_str"/>
   
 
2) Indexing of Download Folder using post.jar

$ java -Dtype=text/csv -Dc=download_search -Drecursive -jar post.jar /home/amit/Downloads

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM