简体   繁体   English

如何减少Solr中多值字段的长度

[英]How to reduce the length of a multivalued field in Solr

We have a multivalued field in Solr that we want to reduce its length. 我们希望在Solr中有一个多值字段,以缩短其长度。 A sample result response is as follows: 样本结果响应如下:

 response": {
     "numFound": 1,
     "start": 0,
     "docs": [
       {
         "created_date": "2016-11-23T13:47:46.55Z",
         "solr_index_date": "2016-12-01T08:21:59.78Z",
         "modified_date": "2016-12-13T08:45:44.507Z",        
         "id": "FEAE38C2-ABFF-4F0C-8AFD-9B8F51036D8A",        
         "Field1": [
           "false",
           "true",
           "true",
           .....   <= 1200 items
         ]
       }
         ]   
     }

We have big data, a couple of TB and we are looking for an optimized way to alter all documents within Solr and to modify Field1 to contain only the first 100 items. 我们拥有大数据和几个TB,我们正在寻找一种优化的方法来更改Solr中的所有文档,并将Field1修改为仅包含前100个项目。

Can something like this be done without the need to write a script to manually fetch the document, make adjustments and push it back to solr? 无需编写脚本来手动获取文档,进行调整并将其推回solr,是否可以执行类似的操作? Has anyone had a similar experience? 有没有人有过类似的经历? Thanks 谢谢

We have faced this problem. 我们已经遇到了这个问题。 But we use Two collections to solve this problem. 但是我们使用两个集合来解决这个问题。 Use SoleEntityProcessor to move the document from one collection to another. 使用SoleEntityProcessor将文档从一个集合移动到另一个集合。

[SolrEntityProcessor]

<dataConfig>
  <document>
    <entity name="sep" processor="SolrEntityProcessor" url="http://localhost:8983/solr/db" query="*:*"/>
  </document>
</dataConfig>

While moving pass that document through updateRequestProcessorChain where we can write StatelessScriptUpdateProcessorFactory to edit our documents or to truncate the multivalued field. 在通过updateRequestProcessorChain移动该文档时,我们可以编写StatelessScriptUpdateProcessorFactory来编辑我们的文档或截断多值字段。
In StatelessScriptUpdateProcessorFactory you can get the field and apply your operations and then reset that field. 在StatelessScriptUpdateProcessorFactory中,您可以获取该字段并应用您的操作,然后重置该字段。

[StatelessScriptUpdateProcessorFactory]

function processAdd(cmd) {
    doc = cmd.solrDoc;
    multiDate = doc.getFieldValue("multiValueField");
    //Apply your operation to above field
    //doc.setField("multiValueField",value);

}
function processDelete(cmd) {
  // no-op
}

function processMergeIndexes(cmd) {
  // no-op
}

function processCommit(cmd) {
  // no-op
}

function processRollback(cmd) {
  // no-op
}

function finish() {
  // no-op
}

For More information on StatelessScriptUpdateProcessorFactory, you can refer to this question On solr how can i copy selected values only from multi valued field to another multi valued field? 有关StatelessScriptUpdateProcessorFactory的更多信息,您可以参考此问题。 在solr上,我如何仅将所选值从多值字段复制到另一个多值字段? in which they edit the multivalued field using the script. 他们使用脚本在其中编辑多值字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM