简体   繁体   中英

Elasticsearch exclude documents containing specific terms

I've indexed documents like bellow in elasticsearch .

{    
    "category": "clothing (f)",
    "description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
    "name": "Women's Unstoppable Graphic T-Shirt",
    "price": "$34.99"
}

There are categories like clothing (m) , clothing (f) etc. I am trying to exclude the cloting (m) category items if the search is for female items. The query I am trying is:

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "description": "women's black shirt"
          }
        }
      ],
      "must_not": [
        {
          "term": {
            "category": "clothing (m)"
          }
        }
      ]
    }
  },
  "from": 0,
  "size": 50
}

But this is not working as expected. There are always few results with clothing (m) document with other documents. How can I exclude documents which have a particular category?

In order to exclude a specific term (exact match) you will have to use keyword datatype.

Keyword datatypes are typically used for filtering (Find me all blog posts where status is published), for sorting, and for aggregations. Keyword fields are only searchable by their exact value .

Keyword Datatype

Your current query catches clothing (m) in the results because when you indexed your documents they were analyzed with elasticsearch standard analyzer which analyzes clothing (m) as clothing and (m) .

In your query you searched for category as text datatype.

Text datatype fields are analyzed, that is they are passed through an analyzer to convert the string into a list of individual terms before being indexed.

Run this command:

POST my_index/_analyze
{
  "text": ["clothing (m)"]
}

Results:

{
  "tokens" : [
    {
      "token" : "clothing",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "m",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}

A working example:

Assuming you mappings look like that:

{
 "my_index" : {
    "mappings" : {
      "properties" : {
        "category" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "description" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "name" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        },
        "price" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

Let's post a few documents:

POST my_index/_doc/1
{    
    "category": "clothing (m)",
    "description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
    "name": "Women's Unstoppable Graphic T-Shirt",
    "price": "$34.99"
}


POST my_index/_doc/2
{    
    "category": "clothing (f)",
    "description": "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
    "name": "Women's Unstoppable Graphic T-Shirt",
    "price": "$34.99"
}

Now our query should look like this:

GET my_index/_search
{
  "query": {
    "bool": {
      "must": {
        "match": {
          "description": "women's black shirt"
        }
      },
      "filter": {
        "bool": {
          "must_not": {
            "term": {
              "category.keyword": "clothing (m)"
            }
          }
        }
      }
    }
  },
  "from": 0,
  "size": 50
}

The results:

{
  "took" : 0,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 0.43301374,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43301374,
        "_source" : {
          "category" : "clothing (f)",
          "description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
          "name" : "Women's Unstoppable Graphic T-Shirt",
          "price" : "$34.99"
        }
      }
    ]
  }
}

Results without using keyword

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.43301374,
    "hits" : [
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43301374,
        "_source" : {
          "category" : "clothing (f)",
          "description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
          "name" : "Women's Unstoppable Graphic T-Shirt",
          "price" : "$34.99"
        }
      },
      {
        "_index" : "my_index",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.43301374,
        "_source" : {
          "category" : "clothing (m)",
          "description" : "Women's Unstoppable Graphic T-Shirt - Women’s Short Sleeve Shirt",
          "name" : "Women's Unstoppable Graphic T-Shirt",
          "price" : "$34.99"
        }
      }
    ]
  }
}

As you can see from the last results we got also clothing (m) . BTW don't use term for text datatype. use match .

Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM