Elasticsearch通配符搜索和相关性

Question

I am trying to implement wildcard for a suggestion dropdown. 我正在尝试为建议下拉列表实现通配符。 I have a few days already since I try to figure out this. 自从我试图找出这个以来，我已经有几天了。 :( :(

I have a list of restaurants (4000-7000). 我有一份餐馆名单（4000-7000）。 I want to search with wildcard in restaurant names and to display first the results where search is in front of text. 我想在餐馆名称中使用通配符进行搜索，并首先显示搜索位于文本前面的结果。

I tried to index the name field without analyzer, with ngram analyzer and many other solutions I found on the net but without luck. 我尝试在没有分析器的情况下索引名称字段，使用ngram分析器和我在网上找到的许多其他解决方案，但没有运气。

Best results by now I get by with this setup: 现在最好的结果我得到了这个设置：

settings:
  analysis: {
    analyzer: {
      default: {
        tokenizer: :keyword, 
        filter: [:lowercase]
      }
    }
  }

And index name field like this: 和索引名称字段如下：

indexes :name, type: :string, analyzer: :default

Search : query: {wildcard: {name: '*le*'}} 搜索：查询：{wildcard：{name：'* le *'}}
Result : Mr. Beef on Orleans, Miller's Pub, Merlo on Maple, Le Bouchon, Les Nomades, Leonardo's Ristorante, Lem's Bar-BQ House, Le Petit Paris, Joy Yee's Noodles - Chinatown, J. Alexander's (Lincoln Park), Indian Garden - Streeterville, Goose Island Brewpub - Wrigleyville, Tweet ... Let's Eat!, Arco de Cuchilleros, Al's #1 Italian Beef - Little Italy 结果：奥尔良牛肉先生，米勒酒吧，枫叶梅洛，Le Bouchon，Les Nomades，Leonardo's Ristorante，Lem's Bar-BQ House，Le Petit Paris，Joy Yee's Noodles - 唐人街，J。Alexander（林肯公园），印度花园 - Streeterville，Goose Island Brewpub - Wrigleyville，Tweet ...让我们吃吧！，Arco de Cuchilleros，Al's＃1意大利牛肉 - 小意大利

I want that the results that start with ' le ' to be in front, to have a higher score. 我希望以' le '开头的结果在前面，以获得更高的分数。 Because usually the people search for a restaurant that starts with. 因为通常人们会搜索一个以餐馆开头的餐馆。 But I can not search without * in front because I do want also the results that contain this but with lower score in the results. 但是我不能在没有*的情况下进行搜索，因为我确实也想要包含此结果但结果中得分较低的结果。 For example above 'Le Colonial', 'Le Petit Paris', 'Les Nomades' should be in front. 比如上面的'Le Colonial'，'Le Petit Paris'，'Les Nomades'应该在前面。

How can I accomplish this? 我怎么能做到这一点？

The other concern I have it's performance. 另一个问题是我的表现。 I know that wildcard in booth ends it's the worst case possible but I could not find any solution that gives me something ok in result with ngram or shingle. 我知道展位中的通配符结束了，这是最糟糕的情况，但我找不到任何解决方案，给我一些结果与ngram或shingle一样好。

Answer 1

Use boost to pick the first match on top. 使用提升选择顶部的第一场比赛。

Using two wildcard query 使用两个通配符查询

curl -XPOST "http://hostname:9200/index/type/_search" -d'
{
"size": 2000,
"query": {
    "bool": {
        "should": [
            {
                "wildcard": {
                    "name": {
                        "value": "*le*"
                    }
                }
            },
            {
                "wildcard": {
                    "name": {
                        "value": "le*",
                        "boost": 5
                    }
                }
            }
        ]
    }
}
}'

Using one wildcard and one prefixquery 使用一个通配符和一个prefixquery

curl -XPOST "http://hostname:9200/index/type/_search" -d'
{
"size": 2000,
"query": {
    "bool": {
        "should": [
            {
                "wildcard": {
                    "name": {
                        "value": "*le*"
                    }
                }
            },
            {
                "prefix": {
                    "name": {
                        "value": "le",
                        "boost": 2
                    }
                }
            }
        ]
    }
}
}'

Elasticsearch通配符搜索和相关性

问题描述

1 个解决方案

解决方案1
12 已采纳 2014-04-21 12:46:45

Elasticsearch通配符搜索和相关性

问题描述

1 个解决方案

解决方案1 12 已采纳 2014-04-21 12:46:45

解决方案1
12 已采纳 2014-04-21 12:46:45