简体   繁体   English

Elasticsearch通配符搜索和相关性

[英]Elasticsearch wildcard search and relevance

I am trying to implement wildcard for a suggestion dropdown. 我正在尝试为建议下拉列表实现通配符。 I have a few days already since I try to figure out this. 自从我试图找出这个以来,我已经有几天了。 :( :(

I have a list of restaurants (4000-7000). 我有一份餐馆名单(4000-7000)。 I want to search with wildcard in restaurant names and to display first the results where search is in front of text. 我想在餐馆名称中使用通配符进行搜索,并首先显示搜索位于文本前面的结果。

I tried to index the name field without analyzer, with ngram analyzer and many other solutions I found on the net but without luck. 我尝试在没有分析器的情况下索引名称字段,使用ngram分析器和我在网上找到的许多其他解决方案,但没有运气。

Best results by now I get by with this setup: 现在最好的结果我得到了这个设置:

settings:
  analysis: {
    analyzer: {
      default: {
        tokenizer: :keyword, 
        filter: [:lowercase]
      }
    }
  }

And index name field like this: 和索引名称字段如下:

indexes :name, type: :string, analyzer: :default

Search : query: {wildcard: {name: '*le*'}} 搜索 :查询:{wildcard:{name:'* le *'}}
Result : Mr. Beef on Orleans, Miller's Pub, Merlo on Maple, Le Bouchon, Les Nomades, Leonardo's Ristorante, Lem's Bar-BQ House, Le Petit Paris, Joy Yee's Noodles - Chinatown, J. Alexander's (Lincoln Park), Indian Garden - Streeterville, Goose Island Brewpub - Wrigleyville, Tweet ... Let's Eat!, Arco de Cuchilleros, Al's #1 Italian Beef - Little Italy 结果 :奥尔良牛肉先生,米勒酒吧,枫叶梅洛,Le Bouchon,Les Nomades,Leonardo's Ristorante,Lem's Bar-BQ House,Le Petit Paris,Joy Yee's Noodles - 唐人街,J。Alexander(林肯公园),印度花园 - Streeterville,Goose Island Brewpub - Wrigleyville,Tweet ...让我们吃吧!,​​Arco de Cuchilleros,Al's#1意大利牛肉 - 小意大利

I want that the results that start with ' le ' to be in front, to have a higher score. 我希望以' le '开头的结果在前面,以获得更高的分数。 Because usually the people search for a restaurant that starts with. 因为通常人们会搜索一个以餐馆开头的餐馆。 But I can not search without * in front because I do want also the results that contain this but with lower score in the results. 但是我不能在没有*的情况下进行搜索,因为我确实也想要包含此结果但结果中得分较低的结果。 For example above 'Le Colonial', 'Le Petit Paris', 'Les Nomades' should be in front. 比如上面的'Le Colonial','Le Petit Paris','Les Nomades'应该在前面。

How can I accomplish this? 我怎么能做到这一点?

The other concern I have it's performance. 另一个问题是我的表现。 I know that wildcard in booth ends it's the worst case possible but I could not find any solution that gives me something ok in result with ngram or shingle. 我知道展位中的通配符结束了,这是最糟糕的情况,但我找不到任何解决方案,给我一些结果与ngram或shingle一样好。

Use boost to pick the first match on top. 使用提升选择顶部的第一场比赛。

Using two wildcard query 使用两个通配符查询

curl -XPOST "http://hostname:9200/index/type/_search" -d'
{
"size": 2000,
"query": {
    "bool": {
        "should": [
            {
                "wildcard": {
                    "name": {
                        "value": "*le*"
                    }
                }
            },
            {
                "wildcard": {
                    "name": {
                        "value": "le*",
                        "boost": 5
                    }
                }
            }
        ]
    }
}
}'

Using one wildcard and one prefixquery 使用一个通配符和一个prefixquery

curl -XPOST "http://hostname:9200/index/type/_search" -d'
{
"size": 2000,
"query": {
    "bool": {
        "should": [
            {
                "wildcard": {
                    "name": {
                        "value": "*le*"
                    }
                }
            },
            {
                "prefix": {
                    "name": {
                        "value": "le",
                        "boost": 2
                    }
                }
            }
        ]
    }
}
}'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM