简体   繁体   中英

Add multiValued field to a SolrInputDocument

We are using a solr embeded instance for Java SolrJ.

I want to add a multivalued field to a document. The multivalued field is a coma separated String.

In Java I want to do:

solrInputDocument.addField(Field1, "value1,value2,value3");

The definition for Field1 in the schema is as follow

<field name="Field1" type="multiValuedField"   indexed="true"  stored="true"  multiValued="true" required="false"/>

<fieldType name="multiValuedField" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="index">
         <tokenizer class="solr.ClassicTokenizerFactory"/>
     </analyzer>
</fieldType> 

With this configuration we were expecting that when we invoke the addField method Solr was able to check that it is a multiValuedField and so it converts the String into an arrayList with the different values.

Instead we are getting an arraylist with just one value that is in fact the original string added to the document.

Question: should be the tokenizer taking care of this, or should we do it ourselves when we are adding multivalued fields to the document?

Thanks.

The addField method of SolrInputDocument accepts a string and an object. So to handle multivalued fields, you can pass in an ArrayList with your desired values for the second parameter, and SolrJ will update the multivalued field accordingly:

String[] valuesArray = {"value1", "value2", "value3"};
ArrayList<String> values = new ArrayList<String>(Arrays.asList(valuesArray));
solrInputDocument.addField("Field1", values);

You can call SolrInputDocument.addField(String name, Object value) either multiple times passing an Object as the value or a single time passing a Collection as the value.

Example #1:

ArrayList<String> values = Arrays.asList({"value1", "value2", "value3"});
solrInputDocument.addField("field", values);

Example #2:

solrInputDocument.addField("field", "value1");
solrInputDocument.addField("field", "value2");
solrInputDocument.addField("field", "value3");

Both of these examples will result in the same thing. You could even mix and match the calls if you needed to. To see why this works, trace the calls into the Solr source code and you'll find the multi-valued cases are handled in SolrInputField.addValue(Object v, float b) .

/**
 * Add values to a field.  If the added value is a collection, each value
 * will be added individually.
 */
@SuppressWarnings("unchecked")
public void addValue(Object v, float b) {
  if( value == null ) {
    if ( v instanceof Collection ) {
      Collection<Object> c = new ArrayList<Object>( 3 );
      for ( Object o : (Collection<Object>)v ) {
        c.add( o );
      }
      setValue(c, b);
    } else {
      setValue(v, b);
    }

    return;
  }

  boost *= b;

  Collection<Object> vals = null;
  if( value instanceof Collection ) {
    vals = (Collection<Object>)value;
  }
  else {
    vals = new ArrayList<Object>( 3 );
    vals.add( value );
    value = vals;
  }

  // Add the new values to a collection
  if( v instanceof Iterable ) {
    for( Object o : (Iterable<Object>)v ) {
      vals.add( o );
    }
  }
  else if( v instanceof Object[] ) {
    for( Object o : (Object[])v ) {
      vals.add( o );
    }
  }
  else {
    vals.add( v );
  }
}

As I am not using SOLRJ to add elements to SOLR I am not really sure, but I think you should have used

solrInputDocument.addField(Field1, "value1");
solrInputDocument.addField(Field1, "value2");
solrInputDocument.addField(Field1, "value3");

Confirmed. Tokenizers doesn't "cast" the data for you. So, the approach is to work on the data during the loading time, to have it in the proper format.

Thnks for your help.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM