简体   繁体   English

从多个文档中添加词频 (Solr)

[英]Add Term Frequencies From Multiple Documents (Solr)

How can you use Solr math operations or function queries to add the term frequency (tf) values for each word for all documents returned by a query?您如何使用 Solr 数学运算或函数查询为查询返回的所有文档添加每个单词的词频 (tf) 值?

I know that there are ways to iteratively add term vectors using java, but that can take a long time if the index is large or memory is limited.我知道有一些方法可以使用 java 迭代添加术语向量,但是如果索引很大或内存有限,这可能需要很长时间。 Solr has the primitive term frequency values and the ability to add normal field values, so I think it should be able to add term frequencies, I just don't know how. Solr 有原始词频值和添加普通字段值的能力,所以我认为它应该能够添加词频,我只是不知道如何。

Also, I do not know what the words are ahead of time, each document can have any combination of words.还有,我不知道提前是什么词,每个文档可以有任意的词组合。

For this doc result:对于此文档结果:

"docs": [
  {
    "id": 0,
    "content": [
      "FOO FOO BAR"
    ],
  },
  {
    "id": 0,
    "content": [
      "FOO BAR"
    ],
  },
]},"termVectors": [
"uniqueKeyFieldName",
[
  "0",
  [
    "FOO",
    [
      "tf",
      2
    ],
    "BAR",
    [
      "tf",
      1
    ],
],"1",
  [
    "FOO",
    [
      "tf",
      1
    ],
    "BAR",
    [
      "tf",
      1]]}

I would like something like this instead:我想要这样的东西:

{"frequencies":{
"FOO" : 3
"BAR" : 2
}


UPDATE: I am now OK with a programmatic approach using Java because I do not think SOLR supports an operation like this out of the box.更新:我现在可以使用 Java 的编程方法,因为我认为 SOLR 不支持这样的开箱即用的操作。

totaltermfreqttf提供术语在索引中出现的总次数。

您是否已经考虑过“ 方面 ”功能?

Have you checked the Stats Component?你检查过统计组件吗? It's possible to define dynamic fields as stats.field .可以将动态字段定义为stats.field See stats.field={!func}termfreq('text','memory') in the Solr Stats Component example :请参阅Solr 统计组件示例中的stats.field={!func}termfreq('text','memory')

http://localhost:8983/solr/techproducts/select?q=*:*&wt=xml&stats=true&stats.field={!func}termfreq('text','memory')&stats.field=price&stats.field=popularity&rows=0&indent=true

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM