简体   繁体   中英

Indexing xml file into solr using XPathEntityProcessor - unable to index some tags

Using the XPathEntityProcessor , I am indexing following xml file (just an example):

<shop>
 <vegitable>
 ....
 </vegitable>
 <fruit>
      <property>
        <kind>apple</kind>
        <value>3.08</value>
        <from>USA</from>
      </property>
      <property>
        <kind>banana</kind>
        <value>8.5</value>
        <from>CA</from>
      </property>  
      <property>
        <kind>painaple</kind>
        <value>102.8</value>
        <from>CA</from>
      </property>
 </fruit>
 ....
 ....
 ....
 </shop>

I wish to store apple property in one field and all other properties into another field, so that i can use it for view purpose. Bellow is my solr-config.xml file, but solr doesn't process these 2 fields.

<dataConfig>
        <dataSource type="FileDataSource" encoding="UTF-8" />
        <document>
        <entity name="drug"
                processor="XPathEntityProcessor"
                stream="true"
                forEach="/shop/"
                url="/data/shop.xml"
                transformer="RegexTransformer,DateFormatTransformer"
                >
                ....
                ....
            <field column="apple-imported-form" xpath="/shop/fruit/property/[kind='apple']/from"/>
            <field column="apple-imported-value" xpath="/shop/fruit/property/[kind='apple']/value"/>
        </entity>
       </document>
</dataConfig>

while reading document on XPathEntityProcessor on solr, i found following lines:

The XPathEntityProcessor implements a streaming parser which supports a subset of xpath syntax. Complete xpath syntax is not supported but most of the common use cases are covered.

But there is nothing mentioned about what things not covered while from Xpath. Please guide me.

Thanks in advance!!

I found this in documentation: https://wiki.apache.org/solr/DataImportHandler

The XPathEntityProcessor implements a streaming parser which supports a subset of xpath syntax. Complete xpath syntax is not supported but most of the common use cases are covered as follows:

   xpath="/a/b/subject[@qualifier='fullTitle']"
   xpath="/a/b/subject/@qualifier"
   xpath="/a/b/c"
   xpath="//a/..."
   xpath="/a//b..."

I also tried xpath as below but it didn't work. (My Solr version is: 5.2)

   xpath="/a/b/subject[@qualifier='fullTitle']/id"

It seems we have to declare qualifier last.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM