简体   繁体   中英

Solr: Using DataImportHandler for XML import with XSLT processing

I am having serious problems to configure the Solr 4.10.3 DIH to import XML files. Been trying for hours, but no luck. Here is my configuration:

<dataConfig>
  <dataSource encoding="UTF-8" 
    type="FileDataSource" basePath="/path/to/my/cores/root/myCoreName/"/>
  <document>
    <entity
        name="pickupdir"
        processor="FileListEntityProcessor"
        rootEntity="false"
        fileName=".*\.xml"
        baseDir="/import"
        recursive="true"
        newerThan="${dataimporter.last_index_time}"
    />

    <entity 
        name="xml"
        processor="XPathEntityProcessor"
        datasource="pickupdir"
        stream="true"
        useSolrAddSchema="true"
        url="${pickupdir.fileAbsolutePath}"
        xsl="solr.xsl"
    />
  </document>
</dataConfig>

The XSLT "solr.xls" transforms the XML files to the Solr import format, so I've set useSolrAddSchema="true". However, when I try to run this dataimport from the Browser Admin console, I keep getting the error:

java.io.FileNotFoundException: Could not find file:  (resolved to: /path/to/my/cores/root/myCoreName/

A few things are not clear to me here:

  • The error msg it doesn't say which file it was looking for exactly.
  • Why does it say "could not find file" when it is looking for a directory?
  • If I understand the "basePath" attribute of dataSource correctly, this will be the basis for resolving relative paths given in the entity element. So, the baseDir "/import" would get resolved to "/path/to/my/cores/root/myCoreName/import". But this doesn't seem to be happening correctly.
  • How would I configure the paths to use relative paths to the solr root instead of absolute paths?

Maybe someone can point me to some working examples for XML imports using XSLT and DIH. I would like to stick with the XSLT, because that's working already (I've tested the import before with the Simple Post Tool).

Cheers,

Martin

As per the documentation , try adding dataSource="null" attribute to the outer entity. Without that attribute, it picks up the first Data Source declared, which is your FileDataSource.

You also seem to have forgotten to close the second entity.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM