简体   繁体   中英

Apache Solr Field data type change from Strings

One of the fields in the solr core is with the data type strings but it is unable to accomodate the length of the value to the field, So I wish to change it some other data type that could accoumodate the strings,

Unfortunately text_general does not help as it is similar to string and not strings. Is there any other datatype that could help?

Whether a field is multiValued or not (what is what you're describing), is configured with a default value on the field type, but that value can be overridden for each field you're defining. So the difference between string and strings is just that the latter has multiValued="true" as the default, while string has multiValued="false" as the default.

When actually defining the field, you can override this to say whether your document allows a specific field to be multiValued, regardless of what the field type definition says.

<field name="string_field" type="string" multiValued="true"/>

would be have the same as the strings field type, since it explicitly allows the field to have multiple values in the field.

So in your case, you can use text_general - it might not be set to multiValued by default, but you can configure that when you define the field.

<field name="your_field_name" type="text_general" multiValued="true" />

The difference between text_general and string is that text_general has an analysis chain and tokenizer applied to it, so that the text is split internally into smaller tokens.

Lucene has a hard limit on 32768 characters per token, and this is the limit you're hitting when indexing a larger value into a string field.

If you're going to store large blobs in Solr, I'd probably recommend putting them in Amazon S3 or another data store, and instead store the generated id in Solr. That way the index size is kept lower and you remove overhead when merging segments.

text_general is an array of strings. So if you are looking for a type similar to strings data type which is like an array - text_general should do.

Another advantage of text_general is it allows tokenization; strings do not.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM