简体   繁体   中英

Importing from multiple text files in Solr

I have two text files, call them A.txt and B.txt. Both A.txt and B.txt have 3 fields. But the semantics are different. Let's name these fields as follows:

A.txt : f1, f2, f3
B.txt : f1, f2, f4

A.txt and B.txt have same values for f1 and f2, but have a different third field.

I would like to import these files into Solr (I'm using Solr 4.5). But the caveat is that corresponding entries from A.txt and B.txt should have to combined into one single document. So for example, if we have:

A.txt
1,50,foo
51,100,bar

B.txt
1,50,xkcd
51,100,qc

After dataimport has happened, there should be 2 documents in Solr:

1,50,foo,xkcd
51,100,bar,qc

If the documents were stored in SQL databases, it would be a simple join query. But since the docs are stored as lines in a CSV file, and I am using LineEntityProcessor with a transformer function to do the dataimport, is there a way of accomplishing this task?

I would like to import these files into Solr (I'm using Solr 4.5). But the caveat is that corresponding entries from A.txt and B.txt should have to combined into one single document...After dataimport has happened, there should be 2 documents in Solr

This won't work as you intend. By default, Solr treats a document update as the functional equivalent of a transactional delete and insert.

With Solr 4, atomic update capabilities permitted some fields to be updated without impacting the rest of the document field data. There are limitations in how those work, as it requires metadata about the update and necessitates XML or JSON formats. The text file structures you've identified won't work with this, though.

My suggestion: save yourself the headache and write a file merge script that combines your text files to produce the record you want prior to storing it in Solr.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM