简体   繁体   中英

Why is ElasticSearch match query returning all results?

I have the following ElasticSearch query which I would think would return all matches on the email field where it equals myemails@email.com

"query": {
  "bool": {
    "must": [
      {
        "match": {
          "email": "myemail@gmail.com"
      }
    }
  ]
}

}

The mapping for the user type that is being searched is the following:

    {
      "users": {
      "mappings": {
         "user": {
            "properties": {
               "email": {
                  "type": "string"
               },
               "name": {
                  "type": "string",
                  "fields": {
                     "raw": {
                        "type": "string",
                        "index": "not_analyzed"
                     }
                  }
               },
               "nickname": {
                  "type": "string"
               },
            }
         }
       }
   }  
     }

The following is a sample of results returned from ElasticSearch

 [{
    "_index": "users",
    "_type": "user",
    "_id": "54b19c417dcc4fe40d728e2c",
    "_score": 0.23983537,
    "_source": {
    "email": "johnsmith@gmail.com",
    "name": "John Smith",
    "nickname": "jsmith",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "9c417dcc4fe40d728e2c54b1",
    "_score": 0.23983537,
    "_source": {
       "email": "myemail@gmail.com",
       "name": "Walter White",
       "nickname": "wwhite",
 },
 {
    "_index": "users",
    "_type": "user",
    "_id": "4fe40d728e2c54b19c417dcc",
    "_score": 0.23983537,
    "_source": {
       "email": "JimmyFallon@gmail.com",
       "name": "Jimmy Fallon",
       "nickname": "jfallon",
}]

From the above query, I would think this would need to have an exact match with 'myemail@gmail.com' as the email property value.

How does the ElasticSearch DSL query need to change in order to only return exact matches on email .

The email field got tokenized , which is the reason for this anomaly. So what happened is when you indexed

"myemail@gmail.com" => [ "myemail" , "gmail.com" ]

This way if you search for myemail OR gmail.com you will get the match right. SO what happens is , when you search for john@gmail.com , the analyzer is also applied on search query. Hence its gets broken into

"john@gmail.com" => [ "john" , "gmail.com" ]

here as "gmail.com" token is common in search term and indexed term , you will get a match.

To over ride this behavior , declare the email; field as not_analyzed. There by the tokenization wont happen and the entire string will get indexed as such.

With "not_analyzed"

"john@gmail.com" => [ "john@gmail.com" ]

So modify the mapping to this and you should be good -

{
  "users": {
    "mappings": {
      "user": {
        "properties": {
          "email": {
            "type": "string",
            "index": "not_analyzed"
          },
          "name": {
            "type": "string",
            "fields": {
              "raw": {
                "type": "string",
                "index": "not_analyzed"
              }
            }
          },
          "nickname": {
            "type": "string"
          }
        }
      }
    }
  }
}

I have described the problem more precisely and another approach to solve it here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM