简体   繁体   English

如何在弹性搜索NEST中对字段的重要性进行评分?

[英]How to score importance of fields in elastic search NEST?

I have tried to look at Boosting and "Function Score Query", but either not understood how to use them for my purpose, or not found what technique to use in order to achieve my goal. 我曾尝试查看Boosting和“功能分数查询”,但要么不了解如何将其用于我的目的,要么未找到用于实现目标的技术。

TL;DR The users informs me how his preferences in regards to different fields/aspects of my products, and I would like elastic search to return to me the products that BEST match his preferences. TL; DR用户告知我他对我产品的不同领域/方面的偏好,我想通过弹性搜索将最符合他偏好的产品归还给我。 Is this possible? 这可能吗?

I have a class with a lot of fields given as numbers. 我有一堂课,有很多用数字给出的字段。 eg: 例如:

public class Product
{
   public double? Weight { get; set; }
   public int? Price { get; set; }
   public double? Size { get; set; }
}

A search will be based on a (at runtime decided) series of prioritizations/scores. 搜索将基于(在运行时确定的)一系列优先级/分数。 eg 例如

Weight: 0 negative
Price: 5 negative
Size: 8 positive

These score (being normalized between 0 and 10) means that this user doesn't care about the weight of the product, he cares somewhat about the price, and he wants it negatively correlated with the value of the field (eg he wants price to be low, but "only" with an importance of 5 out of 10. The most important thing for this user is the size, which is quite important to be "large". 这些得分(在0到10之间归一化)表示该用户不在乎产品的重量,他在乎价格,并且希望其与字段的值负相关(例如,他希望价格降至为低,但“仅”重要性为10(满分5分)。对于此用户而言,最重要的是尺寸,“大”非常重要。

For this example I want to make a search between all of my products, but giving a higher score to products with a large size, and making the price lower being "medium" important, and not caring about the weight. 对于此示例,我想在我所有的产品之间进行搜索,但是要给较大尺寸的产品打更高的分数,并降低价格,使其成为“中级”重要商品,而不在乎重量。

How might such a query look like? 这样的查询看起来如何?

PS Any links to documentation/guides for NEST/elastic search would be appreciated. PS任何链接到NEST /弹性搜索的文档/指南的链接,将不胜感激。 I haven't found the official documentation that helpful. 我还没有找到有用的官方文档。

EDIT: Let me rephrase: A user informs me how how important different aspects of my products are. 编辑:让我改写:用户告诉我我产品的不同方面有多么重要。 eg the price, weight and size. 例如价格,重量和尺寸。 To some users a low weight is VERY important (ie they score the important of a LOW weight = 10), to others the price is very important, and to some the weight is important. 对于某些用户而言,低体重非常重要(即,他们将低体重= 10得分),对其他用户而言,价格非常重要,而对某些用户而言,体重也很重要。 To some none of these are important, and to some some fields on my product is important. 对于某些人来说,这些都不重要,对于我产品上的某些领域而言,这也很重要。

After the user has scored the importance of each aspects of my product, I need to search for a product that best matches the users preferences. 在用户为我的产品的各个方面评分之后,我需要搜索最符合用户偏好的产品。

As such if the user thinks the weight and price is the most important, I want elastic products that have a very low weight and price, without caring about the size. 因此,如果用户认为重量和价格是最重要的,我希望弹性产品的重量和价格都非常低,而不关心尺寸。

Example: In elastic I have 4 products: (Weight = W, Size = S, Price = P) 示例:在松紧带中,我有4种产品:(重量= W,尺寸= S,价格= P)

P1: W=200, S=40, P=2500
P2: W=50, S=10, P=2000
P3: W=400, S=45, P=4000
P4: W=200, S=45, P=3000

Low weight/Price = good, High Size = good 低重量/价格=好,高尺寸=好

If a user scores: 如果用户得分:

Weight=10, Price=0, Size=5

The result should be that it returns top X results, sorted (using the score system in elastic search?) as follow: P2,P4,P1,P3 (because a low price is the most important, followed by big size, with the price being irrelevant) 结果应该是返回前X个结果,并按以下顺序排序(使用弹性搜索中的评分系统?):P2,P4,P1,P3(因为低价是最重要的,其次是大号,并带有价格)无关紧要)

If a user scores: 如果用户得分:

Weight=5, Price=3, Size=8

The result should be that it returns top X results, sorted as follow: P4,P3,P1,P2 (because a high/big size is the most important, followed by low weight, with the price being of less importance) 结果应该是返回前X个结果,排序方式如下:P4,P3,P1,P2(因为高/大尺寸是最重要的,其次是轻重量,而价格不太重要)

First of all I am not really sure you know what you want to do here you definitions use words like good or bad and this are terms too wage to define a program. 首先,我不太确定您在这里要做什么,定义是否使用诸如好或坏之类的字眼,而这些术语太难于定义程序了。 Here is a simple program that will do something like you are asking 这是一个简单的程序,将执行您所要求的操作

var index = "product";
            var type = "product";

            var db = new ElasticClient(new Uri("http://localhost:9200"));

            await db.DeleteIndexAsync(index);

            //I am using dynamic data but you can use your class it's easear as well
            await db.IndexAsync(new 
            {
                name = "P1", W=200, S=40, P=2500
            }, i=>i.Index(index).Type(type));

            await db.IndexAsync(new 
            {
                name = "P2", W=50, S=10, P=2000
            }, i=>i.Index(index).Type(type));

            await db.IndexAsync(new 
            {
                name = "P3", W=400, S=100, P=1000
            }, i=>i.Index(index).Type(type));

            await db.IndexAsync(new 
            {
                name = "P4", W=200, S=45, P=3000
            }, i=>i.Index(index).Type(type));

            await Task.Delay(1000);

            //I think there needs to be some sort of normalizations on fields this is a max base normalization so we can use 
            var max = await db.SearchAsync<dynamic>(s =>
               s.Size(0)
               .Index(index)
               .Type(type)
               .Aggregations(aggr =>
                   aggr
                   .Min("maxWeight", f => f.Field("w"))
                   .Max("maxPrice", f => f.Field("s"))
                   .Max("maxSize", f => f.Field("p"))));

            // This is to calculate the factors the max value is to normalize multivariable data so all the values be on scale from 0-1
            //The max value will allways be 1 and the othhers will be a precentage of the max value this will only work for none negative values
            // You can use some other way of normalizing but this depends on the data.
            var paramsData1 = new
            {
                Weight = (10 - 5) / max.Aggs.Max("maxWeight").Value,
                Price = 3 / max.Aggs.Max("maxPrice").Value,
                Size = 8 / max.Aggs.Max("maxSize").Value
            };

            // The first query is based on busting the fields based on factors entered
            var items = await db.SearchAsync<dynamic>(s =>
                s.Index(index)
                .Type(type)
                .Query(q => q.FunctionScore(fs =>
                    fs.Functions(ff =>
                        ff.FieldValueFactor(fvf => fvf.Field("w").Factor(paramsData1.Weight))
                        .FieldValueFactor(fvf => fvf.Field("s").Factor(paramsData1.Size))
                        .FieldValueFactor(fvf => fvf.Field("p").Factor(paramsData1.Price)))
                    .BoostMode(FunctionBoostMode.Sum))));

            System.Console.WriteLine("______________________________");
            foreach (var item in items.Hits)
            {
                System.Console.WriteLine($"Name:{item.Source.name};S:{item.Source.s};W:{item.Source.w};P:{item.Source.p};");
            }


            var paramsData2 = new
            {
                //this is to reverse the data since from what I can tell lower is better
                Weight =(10 - 10) / max.Aggs.Max("maxWeight").Value,
                Price = 0 / max.Aggs.Max("maxPrice").Value,
                Size = 5 / max.Aggs.Max("maxSize").Value
            };

            //You can write you own score function and by hand if needed and do some sort of calculation.
            var itemsScript = await db.SearchAsync<dynamic>(s =>
                s.Index(index)
                .Type(type)
                .Query(q => q.FunctionScore(fs => fs.Functions(ff =>
                    ff.ScriptScore(
                    ss =>
                        ss.Script(script => script.Params(p =>
                            p.Add("Weight", paramsData2.Weight)
                            .Add("Price", paramsData2.Price)
                            .Add("Size", paramsData2.Weight))
                            .Inline("params.Weight * doc['w'].value + params.Price * doc['p'].value + params.Size * doc['s'].value")))))));

            System.Console.WriteLine("______________________________");
            foreach (var item in itemsScript.Hits)
            {
                System.Console.WriteLine($"Name:{item.Source.name};S:{item.Source.s};W:{item.Source.w};P:{item.Source.p};");
            }

But this is just a start Factor analysis is a field of study by it self. 但这仅仅是一个开始。 因子分析本身就是一个研究领域。 Here are a few links for scripting and function scoring I hope it helps. 这里有一些脚本和功能评分的链接,希望对您有所帮助。 https://www.elastic.co/guide/en/elasticsearch/painless/5.5/painless-examples.html https://www.elastic.co/guide/en/elasticsearch/guide/current/script-score.html https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html#scoring-theory https://jontai.me/blog/2013/01/advanced-scoring-in-elasticsearch/(In this one the syntax is out of date but the logic still stands) https://qbox.io/blog/optimizing-search-results-in-elasticsearch-with-scoring-and-boosting https://www.elastic.co/guide/zh-CN/elasticsearch/painless/5.5/painless-examples.html https://www.elastic.co/guide/zh-CN/elasticsearch/guide/current/script-score.html https://www.elastic.co/guide/zh-CN/elasticsearch/guide/current/scoring-theory.html#scoring-theory https://jontai.me/blog/2013/01/advanced-scoring-in-elasticsearch /(在此语法中,语法已过时,但逻辑仍然有效) https://qbox.io/blog/optimizing-search-results-in-elasticsearch-with-scoring-and-bosting

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM