简体   繁体   English

文本字段上的ElasticSearch Analyzer

[英]ElasticSearch Analyzer on text field

Here is my field on elasticSearch : 这是我在elasticSearch上的领域:

"keywordName": {
        "type": "text",
        "analyzer": "custom_stop"
      }

Here is my analyzer : 这是我的分析仪:

"custom_stop": {
      "type":      "custom",
      "tokenizer": "standard",
      "filter": [
        "my_stop",
        "my_snow",
        "asciifolding"
      ]
    }

And here are my filters : 这是我的过滤器:

           "my_stop": {
              "type":       "stop",
              "stopwords":  "_french_"
          },
           "my_snow" : {
                "type" : "snowball",
                "language" : "French"
            }

Here are my documents my index (in my only field : keywordName) : 这是我的文档我的索引(在我唯一的字段中:keywordName):

"canne a peche", "canne", "canne a peche telescopique", "iphone 8", "iphone 8 case", "iphone 8 cover", "iphone 8 charger", "iphone 8 new" “ canne peche”,“ canne”,“ canne a peche telescopique”,“ iphone 8”,“ iphone 8手机壳”,“ iphone 8保护套”,“ iphone 8充电器”,“ iphone 8新”

When I search for "canne", it gives me the "canne" document, which is what I want : 当我搜索“ canne”时,它给了我“ canne”文档,这是我想要的:

GET ads/_search
{
   "query": {
    "match": {
      "keywordName": {
        "query": "canne",
        "operator":  "and"
      }
    }
  },
  "size": 1
}

When I search for "canne à pêche", it gives me "canne a peche", which is OK, too. 当我搜索“ canneàpêche”时,它也给我“ canne peche”,也可以。 Same for "Cannes à Pêche" -> "canne a peche" -> OK. 与“ CannesàPêche”->“ canne a peche”-> OK相同。

Here is the tricky part : when I search for "iphone 8", it gives me "iphone 8 cover" instead of "iphone 8". 这是棘手的部分:当我搜索“ iphone 8”时,它给了我“ iphone 8保护套”而不是“ iphone 8”。 If I change the size, I set 5 (as it returns the 5 results containing "iphone 8"). 如果更改大小,我将设置为5(因为它将返回包含“ iphone 8”的5个结果)。 I see that "iphone 8" is the 4th result in term of score. 我看到“ iphone 8”在得分方面排名第四。 The first is "iphone 8 cover" then "iphone 8 case" then "iphone 8 new" and finally "iphone 8" ... 首先是“ iphone 8保护套”,然后是“ iphone 8保护套”,然后是“ iphone 8新”,最后是“ iphone 8” ...

Here is the result of the query : 这是查询的结果:

{
  "took": 5,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 5,
    "max_score": 1.4009607,
    "hits": [
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8 cover",
        "_score": 1.4009607,
        "_source": {
          "keywordName": "iphone 8 cover"
        }
      },
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8 case",
        "_score": 1.4009607,
        "_source": {
          "keywordName": "iphone 8 case"
        }
      },
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8 new",
        "_score": 0.70293105,
        "_source": {
          "keywordName": "iphone 8 new"
        }
      },
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8",
        "_score": 0.5804671,
        "_source": {
          "keywordName": "iphone 8"
        }
      },
      {
        "_index": "ads",
        "_type": "keyword",
        "_id": "iphone 8 charge",
        "_score": 0.46705723,
        "_source": {
          "keywordName": "iphone 8 charge"
        }
      }
    ]
  }
}

How can I keep the flexibility concerning the keyword "canne a peche" (accents, capital letters, plural terms) but also tell him that if there is an exact match ("iphone 8" = "iphone 8"), give me the exact keywordName ? 如何保持关键字“ canne a peche”(重音,大写字母,复数形式)的灵活性,还告诉他,如果存在完全匹配的内容(“ iphone 8” =“ iphone 8”),请给我确切的名称keywordName?

The match query uses the tf/idf algorithm. 匹配查询使用tf / idf算法。 It means that you would get fuzzy search results ordered by frequency. 这意味着您将获得按频率排序的模糊搜索结果。 If you want to get a result in a case of an exact match you should create a query_string case before and if there is no result use your match query. 如果要在完全匹配的情况下得到结果,则应在之前创建一个query_string情况,如果没有结果,请使用匹配查询。

I suggest something like this: 我建议是这样的:

    "keywordName": {
      "type": "text",
      "analyzer": "custom_stop",
      "fields": {
        "raw": {
          "type": "keyword"
        }
      }
    }

And the query: 和查询:

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "keywordName": {
              "query": "iphone 8",
              "operator": "and"
            }
          }
        },
        {
          "term": {
            "keywordName.raw": {
              "value": "iphone 8"
            }
          }
        }
      ]
    }
  },
  "size": 10
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM