简体   繁体   中英

Solr: Get DataImportHandler to ignore missing elements

I'm trying to use DIH to import data from an XML source I do not maintain. This XML has optional elements grouped as attributes, for example color or flavor. Not all entities have all the attributes, which is perfectly fine and valid. Sadly, DIH skips these entities when I still want them. This is my data-config.xml

<dataConfig>
  <dataSource type="FileDataSource" name="datasource"/>
  <document>
   <entity
     name="files"
     processor="FileListEntityProcessor"
     baseDir="C:\\"
     fileName="recipe_page.*xml"
     recursive="false"
     rootEntity="false"
     dataSource="null">
     <entity
      name="file"
      processor="XPathEntityProcessor"
      url="${files.fileAbsolutePath}"
      forEach="/results|/results/recipe"
      stream="true"
      transformer="TemplateTransformer">
       <field column="recipe_id" xpath="/results/recipe/recipeID" />
       <field column="recipe_title" xpath="/results/recipe/recipeTitle" />     
       <field column="color" xpath="/results/recipe/attributes/Color" default="" />
       <field column="drink_classification" xpath="/results/recipe/attributes/DrinkClassification" default="" />
       <field column="flavor" xpath="/results/recipe/attributes/Flavor" default="" />        
       <field column="uid" template="recipe_${file.recipe_id}" />
       <field column="document_type" template="recipe" />
    </entity>
   </entity>
  </document>
</dataConfig>

How can I tell DIH to ignore missing elements or set default values for these at least?

I'm not sure, that I know how to let XPathEntityProcessor know to ignore missing xpath statements, but I have an idea, that you could set default values for fields by using Transformer and particulary TemplateTransformer

You can use the template transformer to construct or modify a field value, perhaps using the value of other fields. You can insert extra text into the template.

I'm expecting that something like this should work for you:

<entity name="en" pk="id" transformer="TemplateTransformer" ...>

  <!-- generate a full address from fields containing the component parts -->
  <field column="color" template="default-color" />
</entity>

Here I'm trying to replace field value with a value, however, I'm not sure that it will help (I expect it to replace value, if it exists, however, it should be double checked)

In this case, it will be possible to create custom Transformer:

public class DefaultValueTransformer {
        public Object transformRow(Map<String, Object> row) {
                String color= row.get("color");
                if (artist != null)             
                        row.put("color", "default-color-value");

                return row;
        }
}

and later to use as this:

<entity name="entity" query="..." transformer="my.package.DefaultValueTransformer">

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM