简体   繁体   English

Elasticsearch多个值匹配,无需分析器

[英]Elasticsearch multiple values match without analyzer

Pardon my knowledge on ElasticSearch. 请原谅我对ElasticSearch的了解。 I have an Elasticsearch collection which has documents like these: 我有一个Elasticsearch集合,其中包含以下文档:

{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 2,
    "dimensions": {
        "region": "Coimbra District"

    }
}
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 1,
    "dimensions": {
        "region": "Federal District"        
    }
}
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 1,
    "dimensions": {
        "region": "Masovian Voivodeship"
    }
}

These 3 json documents are indexed in the ES server. 这3个json文档在ES服务器中编制索引。 I haven't provided any analyzer type (and don't know how to provide one either :)) I am using spring data Elasticsearch and executing the following query to search for the docs with region 'Masovian Voivodeship' or 'Federal District': 我没有提供任何分析器类型(并且不知道如何提供一个:) :)我使用弹簧数据Elasticsearch并执行以下查询来搜索区域'Masovian Voivodeship'或'Federal District'的文档:

{
  "query_string" : {
    "query" : "Masovian Voivodeship OR Federal District",
    "fields" : [ "dimensions.region" ]
  }
}

I am expecting it to return 2 hits. 我期待它返回2次点击。 However, it returns all 3 docs (probably due to 3rd one having district in it). 但是,它会返回所有3个文档(可能是因为第3个文档中包含了区域)。 How can I modify the query so that it can perform the EXACT match and only provide 2 documents? 如何修改查询以便它可以执行完全匹配并仅提供2个文档? I am using following method: 我使用以下方法:

QueryBuilders.queryString(<OR string>).field("dimensions.region")

I have tried QueryBuilders.termsQuery , QueryBuilders.inQuery and QueryBuilders.matchQuery (with array) but no luck. 我尝试过QueryBuilders.termsQueryQueryBuilders.inQueryQueryBuilders.matchQuery (带数组),但没有运气。

Can anyone please help? 有人可以帮忙吗? Thanks in advance. 提前致谢。

There are a couple of things you can do here. 你可以在这里做几件事。

To start, I set up an index without any explicit mapping or analysis, which means the standard analyzer will be used. 首先,我设置了一个没有任何显式映射或分析的索引,这意味着将使用标准分析器 That's important since it determines how we can query against the text fields. 这很重要,因为它决定了我们如何查询文本字段。

So I started with: 所以我开始:

DELETE /test_index

PUT /test_index
{
   "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 0
   }
}

PUT /test_index/doc/1
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 2,
    "dimensions": {
        "region": "Coimbra District"

    }
}

PUT /test_index/doc/2
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 1,
    "dimensions": {
        "region": "Federal District"        
    }
}

PUT /test_index/doc/3
{
    "date": "2013-12-30T00:00:00.000Z",
    "value": 1,
    "dimensions": {
        "region": "Masovian Voivodeship"
    }
}

Then I tried your query and got no hits. 然后我尝试了你的查询,没有点击。 I don't understand why you have "dimensions.ga:region" in your fields parameter, but when I changed it to "dimensions.region" I got some results: 我不明白为什么你的fields参数中有"dimensions.ga:region" ,但是当我把它改成"dimensions.region"我得到了一些结果:

POST /test_index/doc/_search
{
   "query": {
      "query_string": {
         "query": "Masovian Voivodeship OR Federal District",
         "fields": [
            "dimensions.region"
         ]
      }
   }
}
...
{
   "took": 2,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 0.46911472,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": 0.46911472,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 1,
               "dimensions": {
                  "region": "Masovian Voivodeship"
               }
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.3533006,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 1,
               "dimensions": {
                  "region": "Federal District"
               }
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "1",
            "_score": 0.05937162,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 2,
               "dimensions": {
                  "region": "Coimbra District"
               }
            }
         }
      ]
   }
}

However, this returns a result you don't want. 但是,这会返回您不想要的结果。 One way to fix that is as follows: 解决这个问题的一种方法如下:

POST /test_index/doc/_search
{
   "query": {
      "query_string": {
         "query": "(Masovian AND Voivodeship) OR (Federal AND District)",
         "fields": [
            "dimensions.region"
         ]
      }
   }
}
...
{
   "took": 3,
   "timed_out": false,
   "_shards": {
      "total": 1,
      "successful": 1,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.46911472,
      "hits": [
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "3",
            "_score": 0.46911472,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 1,
               "dimensions": {
                  "region": "Masovian Voivodeship"
               }
            }
         },
         {
            "_index": "test_index",
            "_type": "doc",
            "_id": "2",
            "_score": 0.3533006,
            "_source": {
               "date": "2013-12-30T00:00:00.000Z",
               "value": 1,
               "dimensions": {
                  "region": "Federal District"
               }
            }
         }
      ]
   }
}

Another way would to do it (I like this one better) which gives the same results is to use a combination of match query and boolean should : 另一种方法(我更喜欢这个)给出了相同的结果是使用匹配查询布尔的组合应该

POST /test_index/doc/_search
{
   "query": {
      "bool": {
         "should": [
            {
               "match": {
                  "dimensions.region": {
                     "query": "Masovian Voivodeship",
                     "operator": "and"
                  }
               }
            },
            {
               "match": {
                  "dimensions.region": {
                     "query": "Federal District",
                     "operator": "and"
                  }
               }
            }
         ]
      }
   }
}

Here is the code I used: 这是我使用的代码:

http://sense.qbox.io/gist/bb5062a635c4f9519a411fdd3c8540eae8bdfd51 http://sense.qbox.io/gist/bb5062a635c4f9519a411fdd3c8540eae8bdfd51

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM