ElasticSearch tokenizes the data. So "this-that" gets split into 2 tokens. If it makes a difference, I am using the 6.6 version of ES. I have documents, that have different fields, such as title, descriptions, name, etc. In order for us to have a unique identifier, the text in Title "This that" get slugified into "this-that". I am trying to to query for "this-that" that would return that one document. I have tried all sorts of things. I tried suggestions from other questions in this forum, as well as instructions in the https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html page. Unfortunately, nothing seems to work. Your help would be greatly appreciated. Thank you in advance.
https://www.elastic.co/guide/en/elasticsearch/reference/current/analyzer.html
<?php
require 'vendor/autoload.php';
use Elasticsearch\ClientBuilder;
$hosts = ['localhost:9200'];
$client = ClientBuilder::create()
->setHosts($hosts)
->build();
$params = array();
$params = [
'index' => 'shows',
'type' => '_doc',
'body' => [
'size'=> 10000,
'query' => [
'bool' => [
'must' => [ 'match' => [ 'name' => 'this-that'] ],
]
]
]
];
$response = $client->search($params);
print_r($response);
?>
There are no errors, but it finds all instances with the word "this" and the word "that" in the name field. I would like to get just the one document back!
You can experiment with analyzers and tokenizers using Kibana console or http:
curl -XPOST "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'{ "analyzer": "standard", "text": "this-that"}'
{
"tokens" : [
{
"token" : "this",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "that",
"start_offset" : 5,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
}
]
}
curl -XPOST "http://localhost:9200/_analyze" -H 'Content-Type: application/json' -d'{ "analyzer": "keyword", "text": "this-that"}'
{
"tokens" : [
{
"token" : "this-that",
"start_offset" : 0,
"end_offset" : 9,
"type" : "word",
"position" : 0
}
]
}
To have always exact match for field, you must use keyword-tokenization. You can do it like this:
PUT test-index
{
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "keyword"
}
}
}
}
Which is exactly same as defining field as a keyword type to begin with:
PUT test-index2
{
"mappings": {
"properties": {
"name": {
"type": "keyword"
}
}
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.