简体   繁体   中英

How do I match entries in the elasticsearch containing dashes (-)

ElasticSearch tokenizes the data. So "this-that" gets split into 2 tokens. If it makes a difference, I am using the 6.6 version of ES. I have documents, that have different fields, such as title, descriptions, name, etc. In order for us to have a unique identifier, the text in Title "This that" get slugified into "this-that". I am trying to to query for "this-that" that would return that one document. I have tried all sorts of things. I tried suggestions from other questions in this forum, as well as instructions in the https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html page. Unfortunately, nothing seems to work. Your help would be greatly appreciated. Thank you in advance.

https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html

<?php
require 'vendor/autoload.php';
use Elasticsearch\ClientBuilder;

$hosts = ['localhost:9200'];
$client = ClientBuilder::create()
    ->setHosts($hosts)
    ->build();

$params = array();

$params = [
    'index' => 'shows',
    'type' => '_doc',
    'body' => [
        'size'=> 10000,
        'query' => [
            'bool' => [
                'must' => [ 'match' => [ 'name' => 'this-that'] ],
            ]
        ]
    ]
];

$response = $client->search($params);

print_r($response);
?>

There are no errors, but it finds all instances with the word "this" and the word "that" in the name field. I would like to get just the one document back!

You can experiment with analyzers and tokenizers using Kibana console or http:

curl -XPOST "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'{  "analyzer": "standard",  "text": "this-that"}'
{
  "tokens" : [
    {
      "token" : "this",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "that",
      "start_offset" : 5,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]
}
curl -XPOST "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'{  "analyzer": "keyword",  "text": "this-that"}'
{
  "tokens" : [
    {
      "token" : "this-that",
      "start_offset" : 0,
      "end_offset" : 9,
      "type" : "word",
      "position" : 0
    }
  ]
}

To have always exact match for field, you must use keyword-tokenization. You can do it like this:

PUT test-index
{
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "keyword"
      }
    }
  }
}

Which is exactly same as defining field as a keyword type to begin with:

PUT test-index2
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword"
      }
    }
  }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM