简体   繁体   中英

Elasticsearch : Problem with querying document where “.” is included in field

I have an index where some entries are like

{
    "name" : " Stefan Drumm"
}
...
{
    "name" : "Dr. med. Elisabeth Bauer"
}

The mapping of the name field is

{
  "name": {
    "type": "text",
    "analyzer": "index_name_analyzer",
    "search_analyzer": "search_cross_fields_analyzer"
  }
}

When I use the below query

GET my_index/_search
  {"size":10,"query":
   {"bool":
    {"must":
     [{"match":{"name":{"query":"Stefan Drumm","operator":"AND"}}}]
,"boost":1.0}},
"min_score":0.0}

It returns the first document.

But when I try to get the second document using the query below

GET my_index/_search
  {"size":10,"query":
   {"bool":
    {"must":
     [{"match":{"name":{"query":"Dr. med. Elisabeth Bauer","operator":"AND"}}}]
,"boost":1.0}},
"min_score":0.0}

it is not returning anything.

Things I can't do

  1. can't change the index
  2. can't use the term query.
  3. change the operator to 'OR', because in that case it will return multiple entries, which I don't want.

What I am doing wrong and how can I achieve this by modifying the query?

You have configured different analyzers for indexing and searching ( index_name_analyzer and search_cross_fields_analyzer ). If these analyzers process the input Dr. med. Elisabeth Bauer Dr. med. Elisabeth Bauer in an incompatible way, the search isn't going to match. This is described in more detail in Index and search analysis , as well as in Controlling Analysis .

You don't provide the definition of these two analyzers, so it's hard to guess from your question what they are doing. Depending on the analyzers, it may be possible to preprocess your query string (eg by removing . ) before executing the search so that the search will match.

You can investigate how analysis affects your search by using the _analyze API, as described in Testing analyzers . For your example, the commands

GET my_index/_analyze
{
  "analyzer": "index_name_analyzer", 
  "text":     "Dr. med. Elisabeth Bauer"
}

and

GET my_index/_analyze
{
  "analyzer": "search_cross_fields_analyzer", 
  "text":     "Dr. med. Elisabeth Bauer"
}

should show you how the two analyzers configured for your index treats the target string, which might provide you with a clue about what's wrong. The response will be something like

{
  "tokens": [
    {
      "token": "dr",
      "start_offset": 0,
      "end_offset": 2,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "med",
      "start_offset": 4,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "elisabeth",
      "start_offset": 9,
      "end_offset": 18,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "bauer",
      "start_offset": 19,
      "end_offset": 24,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}  

For the example output above, the analyzer has split the input into one token per word, lowercased each word, and discarded all punctuation.

My guess would be that index_name_analyzer preserves punctuation, while search_cross_fields_analyzer discards it, so that the tokens won't match. If this is the case, and you can't change the index configuration (as you state in your question), one other option would be to specify a different analyzer when running the query:

GET my_index/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "name": {
              "query": "Dr. med. Elisabeth Bauer",
              "operator": "AND",
              "analyzer": "index_name_analyzer"
            }
          }
        }
      ],
      "boost": 1
    }
  },
  "min_score": 0
}

In the query above, the analyzer parameter has been set to override the search analysis to use the same analyzer ( index_name_analyzer ) as the one used when indexing. What analyzer might make sense to use depends on your setup. Ideally, you should configure the analyzers to align so that you don't have to override at search time, but it sounds like you are not living in an ideal world.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM