简体   繁体   中英

Indexing an XML file in Apache Solr

I am very new to Solr. I have been through the post ( Need help indexing XML files into Solr using DataImportHandler ) before putting this question. However it didnt help because I am very new to Apache Solr. I am looking to index an xml file and search its contents. Its structure resembles something like this

<entry id="REACT_142474" acc="REACT_142474.5">
<name>((1,6)-alpha-glucosyl)poly((1,4)-alpha-glucosyl)glycogenin =&gt; poly{(1,4)-alpha-      glucosyl} glycogenin + alpha-D-glucose</name>
<description>This event has been computationally inferred from an event that has been demonstrated in another species.The inference is based on the homology mapping in Ensembl Compara. Briefly, reactions for which all involved PhysicalEntities (in input, output and catalyst) have a mapped orthologue/paralogue (for complexes at least 75% of components must have a mapping) are inferred to the other species. High level events are also inferred for these events to allow for easier navigation.More details and caveats of the event inference in Reactome. For details on the Ensembl Compara system see also: Gene orthology/paralogy prediction method.</description>
<dates>
<date type="creation" value="06-JUN-2013"/>
<date type="last_modification" value="06-JUN-2013"/>
</dates>
<cross_references>
<ref dbname="ChEBI" dbkey="17925"/>
<ref dbname="UniProt" dbkey="Q06625"/>
<ref dbname="ChEBI" dbkey="18291"/>
<ref dbname="UniProt" dbkey="P47011"/>
<ref dbname="UniProt" dbkey="P36143"/>
<ref dbname="GO" dbkey="GO:0004135"/>
<ref dbname="taxonomy" dbkey="4932"/>
</cross_references>
<additional_fields>
<field name="organism">Saccharomyces cerevisiae</field>
</additional_fields>
</entry>

Is it essential to use the DIH to import this data into Solr? Isn't there any simpler way to accomplish the task? Can it be done through SolrJ as I am fine with outputting the result through the console too. It would be really helpful if someone could point me to some useful examples or resources on this apart from the official documentation.

The following are groovy examples which parse and then index XML files, using Solrj:

I used the link you posted to use the XPathEntityProcessor on my own data. I was a newbie at the time but it wasn't that difficult.

If you want to use SolrJ then look at this link for an example. I would assume that you could parse your XML using whatever XML parser you want to and then use SolrJ to add new documents to your index.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM