简体   繁体   English

部分搜索返回零命中

[英]Partial search return zero hits

I have managed to do the exact search using elasticsearch (V6.1.3).我设法使用 elasticsearch (V6.1.3) 进行了精确搜索。 But when I am trying to do partial or ignore case (eg: {"query": {"match": {"demodata": "Hello"}}} or {"query": {"match": {"demodata": "ell"}}} ), getting zero hits.但是当我试图做部分或忽略大小写时(例如: {"query": {"match": {"demodata": "Hello"}}}{"query": {"match": {"demodata": "ell"}}} ),命中率为零。 Don't know why?不知道为什么? I have set up my analyzer based on following hints: Partial search我已经根据以下提示设置了我的分析器: 部分搜索

from elasticsearch import Elasticsearch
es = Elasticsearch()
settings={
    "mappings": {
        "my-type": {
            'properties': {"demodata": {
                "type": "string",
                "search_analyzer": "search_ngram",
                "index_analyzer": "index_ngram"
            }
        }},

    },
    "settings": {
            "analysis": {
                    "filter": {
                            "ngram_filter": {
                                    "type": "ngram",
                                    "min_gram": 3,
                                    "max_gram": 8
                            }
                    },
                    "analyzer": {
                            "index_ngram": {
                                    "type": "custom",
                                    "tokenizer": "keyword",
                                    "filter": [ "ngram_filter", "lowercase" ]
                            },
                            "search_ngram": {
                                    "type": "custom",
                                    "tokenizer": "keyword",
                                    "filter": "lowercase"
                            }
                    }
            }
    }
}
es.indices.create(index="my-index", body=settings, ignore=400)
docs=[
    { "demodata": "hello" },
    { "demodata": "hi" },
    { "demodata": "bye" },
    { "demodata": "HelLo WoRld!" }
]
for doc in docs:
    res = es.index(index="my-index", doc_type="my-type", body=doc)

res = es.search(index="my-index", body={"query": {"match": {"demodata": "Hello"}}})
print("Got %d Hits:" % res["hits"]["total"])
print (res)

Updated code based on Piotr Pradzynski input but it is not working!!!根据Piotr Pradzynski输入更新了代码,但它不起作用!!!

from elasticsearch import Elasticsearch
es = Elasticsearch()
if not es.indices.exists(index="my-index"):
    customset={
        "settings": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "my_tokenizer"
                    }
                },
                "tokenizer": {
                    "my_tokenizer": {
                        "type": "ngram",
                        "min_gram": 3,
                        "max_gram": 20,
                        "token_chars": [
                            "letter",
                            "digit"
                        ]
                    }
                }
            }
        }
    }


    es.indices.create(index="my-index", body=customset, ignore=400)
    docs=[
        { "demodata": "hELLO" },
        { "demodata": "hi" },
        { "demodata": "bye" },
        { "demodata": "HeLlo WoRld!" },
        { "demodata": "xyz@abc.com" }
    ]
    for doc in docs:
        res = es.index(index="my-index", doc_type="my-type", body=doc)



es.indices.refresh(index="my-index")
res = es.search(index="my-index", body={"query": {"match": {"demodata":{"query":"ell","analyzer": "my_analyzer"}}}})

#print res
print("Got %d Hits:" % res["hits"]["total"])
print (res)

I think you should use NGram Tokenizer instead of NGram Token Filter and add multi field which will be using this tokenizer.我认为您应该使用NGram Tokenizer而不是NGram Token Filter并添加将使用此标记器的多字段

Something like that:类似的东西:

PUT my-index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "ngram_analyzer": {
          "tokenizer": "ngram_tokenizer",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        }
      },
      "tokenizer": {
        "ngram_tokenizer": {
          "type": "ngram",
          "min_gram": 3,
          "max_gram": 15,
          "token_chars": [
            "letter",
            "digit"
          ]
        }
      }
    }
  },
  "mappings": {
    "my-type": {
      "properties": {
        "demodata": {
          "type": "text",
          "fields": {
            "ngram": {
              "type": "text",
              "analyzer": "ngram_analyzer",
              "search_analyzer": "standard"
            }
          }
        }
      }
    }
  }
}

And then you have to use the added mulit-field demodata.ngram in search:然后你必须在搜索中使用添加的多字段demodata.ngram

res = es.search(index="my-index", body={"query": {"match": {"demodata.ngram": "Hello"}}})

What you need is a query_string search.您需要的是 query_string 搜索。

https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

{
  "query":{
    "query_string":{
      "query":"demodata: *ell*"
    }
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM