简体   繁体   English

Solr中的DynamicFields

[英]DynamicFields in Solr

In my current project I need to index all e-mails and their attachments from multiple mailboxes. 在我当前的项目中,我需要索引来自多个邮箱的所有电子邮件及其附件。

I will use Solr, but I don't know what is the best approach to build my index's structure. 我将使用Solr,但是我不知道什么是构建索引结构的最佳方法。 My first approach was: 我的第一种方法是:

<fields>
<field name="id" require="true"/>
<field name="uid" require="true"/>
//A lot of other fields
<dynamicField name="attachmentName_*" require="false">
<dynamicField name="attachmentBody_*" require="false">
</fields>

But now I am not really sure if it is the best structure. 但是现在我不确定这是否是最好的结构。 I don't think I can search for one term (eg stackoverflow ) and know where the term was (eg attachmentBody_1 or _2 or _3 etc) with a single query. 我认为我无法通过单个查询来搜索一个词(例如stackoverflow ),也不知道该词在哪里(例如, attachmentBody_1_2_3等)。

Anyone have a better suggestion to my index's structure? 有人对我的索引结构有更好的建议吗?

You can use multiValued fields for attachmentName and attachmentBody. 您可以将多值字段用于attachmentName和attachmentBody。 So you would have 2 regular fields instead of dynamic fields. 因此,您将拥有2个常规字段而不是动态字段。 You can then use highlighting to bring back the specific values that match with surrounding context. 然后,您可以使用突出显示来带回与周围环境匹配的特定值。

Another option would be to make each attachment a separate document, and store something to identify which email it belongs to. 另一种选择是使每个附件成为一个单独的文档,并存储一些内容以标识其所属的电子邮件。 The downside of this approach is that you may need to index any data from the email itself several times. 这种方法的缺点是您可能需要多次对电子邮件本身中的任何数据建立索引。 But this is really only a problem if most of the email messages have more than one attachment. 但是,如果大多数电子邮件具有多个附件,这实际上只是一个问题。

I found one possible solution. 我找到了一种可能的解决方案。 All I need to do is set attachmentBody as stored. 我需要做的就是将附件主体设置为已存储。

This solution is not good enough because the index's space will dramatically increase but in my case there is no problem cause I will implement highlight feature too and those fields need to be stored. 该解决方案还不够好,因为索引的空间会急剧增加,但是在我的情况下这没有问题,因为我也将实现突出显示功能,并且需要存储这些字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM