简体   繁体   English

Elasticsearch本机脚本-评估索引文档的字段值

[英]Elasticsearch native script - assessing field value of indexed document

I'm trying to modify the Cosine Similarity Script from imotov on Github . 我正在尝试从Github上的 imotov修改余弦相似脚本。 In his script, his docWeightSum only takes the term frequency (tf) of terms that are in the query, not all the terms in the document itself. 在他的脚本中,他的docWeightSum仅采用查询中术语的术语频率(tf),而不是文档本身中的所有术语。

Take this example below. 请在下面举这个例子。 The docWeightSum would be 9 (4 for "I", 4 for "am", 1 for "Sam"). docWeightSum为9(“ I”为4,“ am”为4,“ Sam”为1)。 What I want to the docWeightSum to be is 10 (add 1 for "ham") because I want to normalize the dot product by both the magnitudes of two vectors. 我想将docWeightSum设置为10(为“ ham”加1),因为我想通过两个向量的两个量值对点积进行归一化。

doc: "I am am I ham Sam" doc:“我是我火腿萨姆”

query: "Sam I am" 查询:“我是山姆”

So I actually have 2 questions, as I index document into Elasticsearch like this: 所以我实际上有两个问题,因为我将文档索引到Elasticsearch中是这样的:

POST /termscore/doc
{
   "text": "I am am I ham",
   "docWeightSum": 9
}
  • Is there existing API to get the sum square of all tf for each indexed document, or to get tf of terms in the document that are not in the query? 是否有现有的API获取每个索引文档的所有tf的平方和,或获取文档中不在查询中的术语的tf? If not, then how can I compute this sum square? 如果不是,那我怎么计算这个平方和?
  • If I precompute the sum square of tf of each document and put into Elasticsearch along with the document content, as in the example above, then when computing the score, how can I access that "docWeightSum" value? 如果像上面的示例一样,预先计算每个文档的tf的平方和并与文档内容一起放入Elasticsearch,那么在计算分数时,如何访问该“ docWeightSum”值?

I am using Elasticsearch 1.7 我正在使用Elasticsearch 1.7

Thanks, 谢谢,

To answer your question, it's possible, but it would be very inefficient to calculate docWeightSum in runtime. 可以回答您的问题,但是在运行时计算docWeightSum效率很低。 So, assuming that you precompute the value and index it in a separate field, you can access these values from a native script using doc lookup mechanism. 因此,假设您预先计算了值并将其索引在单独的字段中,则可以使用doc查找机制从本机脚本访问这些值。 If your calculations are not very complex you might be able to get by using field value factor in a function_score query and avoid writing your own script altogether. 如果您的计算不是很复杂,则可以通过在function_score查询中使用字段值因子来获得,并避免完全编写自己的脚本。

Saying that, I suspect you are asking a wrong question. 话虽如此,我怀疑你在问一个错误的问题。 Instead of trying to implement it as a scoring script, I would suggest to look into creating your own custom SimilarityProvider. 我建议不要尝试将其作为评分脚本来实现,而应考虑创建自己的自定义类似性提供者。 You will most likely find that most of the constructs that you are trying to shoehorn into score script are already there and much easier to implement and use. 您很可能会发现,您试图将其拖入得分脚本的大多数结构已经存在,并且更易于实现和使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在ElasticSearch本机脚本中访问文档字符串值 - Accessing document string values in an ElasticSearch native script 如何通知 Elasticsearch 客户端有新的索引文档? - How can a Elasticsearch client be notified of a new indexed document? 在Elasticsearch上删除/更新文档字段 - remove/update document field on elasticsearch Solr自定义相似性-使用索引文档中的字段 - Solr Custom Similarity - Using a field from the indexed document Elasticsearch:从文档中检索长文本字段 - Elasticsearch : Retrieve long text field from a document 如何在elasticsearch中索引包含ZonedDateTime字段的文档 - How to index document containing ZonedDateTime field in elasticsearch 在Elasticsearch中查询某些文档中缺少的字段 - querying on field missing from some document in elasticsearch ElasticSearch 无痛确定该字段是源文档中的数组 - ElasticSearch painless determine that field was array in source document 如果值不存在,则ElasticSearch脚本进行更新 - ElasticSearch script to update if the value not exist Elasticsearch 5 - 奇怪的错误(值源配置无效;必须具有字段上下文或脚本或标记为未包装) - Elasticsearch 5 - Weird error (value source config is invalid; must have either a field context or a script or marked as unwrapped)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM