简体   繁体   English

Apache Solr Field数据类型从字符串更改

[英]Apache Solr Field data type change from Strings

One of the fields in the solr core is with the data type strings but it is unable to accomodate the length of the value to the field, So I wish to change it some other data type that could accoumodate the strings, solr核心中的字段之一是数据类型字符串,但是它无法适应该字段的值长度,因此我希望将其更改为其他一些可以容纳字符串的数据类型,

Unfortunately text_general does not help as it is similar to string and not strings. 不幸的是,text_general与字符串而非字符串相似,因此无济于事。 Is there any other datatype that could help? 还有其他数据类型可以帮助您吗?

Whether a field is multiValued or not (what is what you're describing), is configured with a default value on the field type, but that value can be overridden for each field you're defining. 字段是否为multiValued(您要描述的内容)是否配置了字段类型的默认值 ,但可以为您定义的每个字段覆盖该值。 So the difference between string and strings is just that the latter has multiValued="true" as the default, while string has multiValued="false" as the default. 因此, stringstrings之间的区别只是后者的默认multiValued="true" ,而string的默认multiValued="false"

When actually defining the field, you can override this to say whether your document allows a specific field to be multiValued, regardless of what the field type definition says. 在实际定义字段时,您可以覆盖此字段以说出文档是否允许特定字段具有多值功能,而与字段类型定义的含义无关。

<field name="string_field" type="string" multiValued="true"/>

would be have the same as the strings field type, since it explicitly allows the field to have multiple values in the field. 将与strings字段类型相同,因为它明确允许该字段在该字段中具有多个值。

So in your case, you can use text_general - it might not be set to multiValued by default, but you can configure that when you define the field. 因此,在您的情况下,您可以使用text_general默认情况下可能不会将其设置为multiValued,但是可以在定义字段时进行配置。

<field name="your_field_name" type="text_general" multiValued="true" />

The difference between text_general and string is that text_general has an analysis chain and tokenizer applied to it, so that the text is split internally into smaller tokens. text_generalstring之间的区别在于, text_general应用了分析链和令牌生成器,因此文本在内部被拆分为较小的令牌。

Lucene has a hard limit on 32768 characters per token, and this is the limit you're hitting when indexing a larger value into a string field. Lucene对每个令牌有32768个字符的硬限制,这是在将较大的值索引到string字段中时要达到的限制。

If you're going to store large blobs in Solr, I'd probably recommend putting them in Amazon S3 or another data store, and instead store the generated id in Solr. 如果要在Solr中存储大型Blob,我可能建议将其放入Amazon S3或其他数据存储中,而不是将生成的ID存储在Solr中。 That way the index size is kept lower and you remove overhead when merging segments. 这样,索引大小将保持较低,并且合并段时可以减少开销。

text_general is an array of strings. text_general是一个字符串数组。 So if you are looking for a type similar to strings data type which is like an array - text_general should do. 因此,如果您要寻找类似于字符串数据类型的类型(例如数组),那么text_general应该可以。

Another advantage of text_general is it allows tokenization; text_general的另一个优点是它允许标记化。 strings do not. 字符串不。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM