简体   繁体   English

Elasticsearch:带有返回意外结果的术语的多面查询

[英]Elasticsearch: Faceted query with terms returning unexpected result

I am trying to run a faceted query on some logs that I have stored in ES. 我正在尝试对存储在ES中的某些日志进行多方面查询。 The logs look something like 日志看起来像

{"severity": "informational","message_hash_value": "00016B15", "user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1", "host": "192.168.8.225", "version": "1.0", "user": "User_1@test.co", "created_timestamp": "2013-03-01T15:34:00", "message": "User viewed contents", "inserted_timestamp": "2013-03-01T15:34:00"}

The query that i am trying to run is 我试图运行的查询是

curl -XGET 'http://127.0.0.1:9200/logs-*/logs/_search' 
-d {"from":0, "size":0, 
    "facets" : { 
         "user" : { 
            "terms" : {"field" : "user", "size" : 999999 } } } }

Notice that the field "user" in the logs is an email address. 请注意,日志中的"user"字段是电子邮件地址。 Now the problem is that the terms-facet search query i use returns a list of terms from the users field as given below. 现在的问题是,我使用的terms-facet搜索查询从用户字段返回了术语列表,如下所示。

u'facets': {u'user': {u'_type': u'terms', u'total': 2004, u'terms': [{u'count': 1002,u'term': u'test.co'}, {u'count': 320, u'term': u'user_1'}, {u'count': 295,u'term': u'user_2'}

Note that that list contains the term 请注意,该列表包含term

{u'count': 1002,u'term': u'test.co'}

which is the domain name for the email addresses of the users. 这是用户电子邮件地址的域名。 Why is elasticsearch treating the domain as a seperate term? 为什么Elasticsearch将域视为一个单独的术语?

Running a query to check the mappings 运行查询以检查映射

curl -XGET 'http://127.0.0.1:9200/logs-*/_mapping?pretty=true'

yields the following for the "user" field "user"字段中产生以下内容

"user" : {
      "type" : "string"
    },

This happens because elasticsearch's default global analyzer tokenizes "@" (in addition to things like whitespace and punctuation) at index time. 发生这种情况是因为Elasticsearch的默认全局分析器在索引时标记了“ @”(除了空格和标点符号外)。 You can get around this issue by telling elasticsearch not to run an analyzer on this field, but you will have to reindex all of your data. 您可以通过告诉elasticsearch不要在此字段上运行分析器来解决此问题,但是您必须重新索引所有数据。

Create your new index 创建新索引

curl -XPUT 'http://localhost:9200/logs-new'

Specify in this new index's mapping that you don't want to analyze the "user" field 在此新索引的映射中指定您不想分析“用户”字段

curl -XPUT 'http://localhost:9200/logs-new/logs/_mapping' -d '{
    "logs" : {
        "properties" : {
            "user" : {
                "type" : "string", 
                "index" : "not_analyzed"
            }
        }
    }
}'

Index a document 索引文件

curl -XPOST 'http://localhost:9200/logs-new/logs' -d '{
    "created_timestamp": "2013-03-01T15:34:00", 
    "host": "192.168.8.225", 
    "inserted_timestamp": "2013-03-01T15:34:00", 
    "message": "User viewed contents", 
    "message_hash_value": "00016B15", 
    "severity": "informational", 
    "user": "User_1@test.co", 
    "user-agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0.1", 
    "version": "1.0"
}'

The elasticsearch facet will now display the entire email address Elasticsearch构面现在将显示整个电子邮件地址

curl -XGET 'http://localhost:9200/logs-new/logs/_search?pretty' -d '{
    "from":0, 
    "size":0, 
    "facets" : { 
         "user" : { 
            "terms" : {
                "field" : "user", 
                "size" : 999999 
            }
        } 
    }
}'

Result: 结果:

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ ]
  },
  "facets" : {
    "user" : {
      "_type" : "terms",
      "missing" : 0,
      "total" : 1,
      "other" : 0,
      "terms" : [ {
        "term" : "User_1@test.co",
        "count" : 1
      } ]
    }
  }
}

References: Core Types: http://www.elasticsearch.org/guide/reference/mapping/core-types/ Reindexing with a new mapping: https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/tCaXgjfUFVU 参考:核心类型: http: //www.elasticsearch.org/guide/reference/mapping/core-types/使用新映射重新索引: https ://groups.google.com/forum/ ? fromgroups#! topic / elasticsearch / tCaXgjfUFVU

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM