[英]Elasticsearch native script - assessing field value of indexed document
I'm trying to modify the Cosine Similarity Script from imotov on Github . 我正在尝试从Github上的 imotov修改余弦相似脚本。 In his script, his docWeightSum only takes the term frequency (tf) of terms that are in the query, not all the terms in the document itself.
在他的脚本中,他的docWeightSum仅采用查询中术语的术语频率(tf),而不是文档本身中的所有术语。
Take this example below. 请在下面举这个例子。 The docWeightSum would be 9 (4 for "I", 4 for "am", 1 for "Sam").
docWeightSum为9(“ I”为4,“ am”为4,“ Sam”为1)。 What I want to the docWeightSum to be is 10 (add 1 for "ham") because I want to normalize the dot product by both the magnitudes of two vectors.
我想将docWeightSum设置为10(为“ ham”加1),因为我想通过两个向量的两个量值对点积进行归一化。
doc: "I am am I ham Sam"
doc:“我是我火腿萨姆”
query: "Sam I am"
查询:“我是山姆”
So I actually have 2 questions, as I index document into Elasticsearch like this: 所以我实际上有两个问题,因为我将文档索引到Elasticsearch中是这样的:
POST /termscore/doc
{
"text": "I am am I ham",
"docWeightSum": 9
}
I am using Elasticsearch 1.7 我正在使用Elasticsearch 1.7
Thanks, 谢谢,
To answer your question, it's possible, but it would be very inefficient to calculate docWeightSum in runtime. 可以回答您的问题,但是在运行时计算docWeightSum效率很低。 So, assuming that you precompute the value and index it in a separate field, you can access these values from a native script using doc lookup mechanism.
因此,假设您预先计算了值并将其索引在单独的字段中,则可以使用doc查找机制从本机脚本访问这些值。 If your calculations are not very complex you might be able to get by using field value factor in a
function_score
query and avoid writing your own script altogether. 如果您的计算不是很复杂,则可以通过在
function_score
查询中使用字段值因子来获得,并避免完全编写自己的脚本。
Saying that, I suspect you are asking a wrong question. 话虽如此,我怀疑你在问一个错误的问题。 Instead of trying to implement it as a scoring script, I would suggest to look into creating your own custom SimilarityProvider.
我建议不要尝试将其作为评分脚本来实现,而应考虑创建自己的自定义类似性提供者。 You will most likely find that most of the constructs that you are trying to shoehorn into score script are already there and much easier to implement and use.
您很可能会发现,您试图将其拖入得分脚本的大多数结构已经存在,并且更易于实现和使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.