简体   繁体   English

如何在Elasticsearch中不匹配裸连字符?

[英]How do I not match a bare hyphen in Elasticsearch?

I am querying apache logs stored in Elasticsearch. 我正在查询存储在Elasticsearch中的Apache日志。 I want to return log entries from a given hostname that has a hyphen and with a populated auth field. 我想从给定的主机名返回日志条目,该主机名带有连字符并具有填充的auth字段。

These strings should be an exact match: "hostname": "example-dev" and not "auth": "-" . 这些字符串应完全匹配: "hostname": "example-dev"而不是"auth": "-"

My questions are: 我的问题是:

  1. How do I correctly remap a type in Elasticsearch to allow a hyphen to be part of the matched string. 如何在Elasticsearch中正确地重新映射类型,以允许连字符成为匹配字符串的一部分。
  2. How do I correctly query a type in Elasticsearch with a bare hyphen. 如何在带有裸字符的Elasticsearch中正确查询类型。

The hyphen is a reserved character in Elasticsearch, so I understand it takes special effort. 连字符是Elasticsearch中的保留字符,因此我知道需要特别的努力。 However, I'm having what seems like a lot of trouble figuring out how to include it in my query. 但是,在弄清楚如何在查询中包括它似乎遇到了很多麻烦。

I have tried to remap the type to be not_analysed . 我试图将类型重新映射为not_analysed It looks like the format has recently changed. 格式似乎最近已更改。 The old way of defining the index ( "analysed" , "not_analysed" , and "no" ) makes sense to me. 定义索引的旧方法( "analysed""not_analysed""no" )对我来说很有意义。 The new way ( true or false ) does not. 新方法( truefalse )没有。 In either case, I cannot seem to get remapping to work. 无论哪种情况,我似乎都无法重新映射到工作上。

Here is my attempt at remapping: 这是我重新映射的尝试:

DELETE /search
PUT search
{
    "mappings" : {
        "beat" : {
            "properties" : {
                "hostname" : {
                    "type" : "text",
                    "norms" : false,
                    "index" : false
                }
            }
        }
    }
}

I have not included the remapping of the auth field because it only returns a mapper_parsing_exception . 我没有包括auth字段的重新映射,因为它仅返回mapper_parsing_exception

I am using json to query Elasticsearch. 我正在使用json查询Elasticsearch。 Here is my query: 这是我的查询:

GET _search
{
    "query": {
        "bool": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "match": {
                                "beat.hostname": "example-dev"
                            }
                        }
                    ],
                    "must_not": [
                        {
                            "match": {
                                "auth.keyword": "-"
                            }
                        }
                    ]
                }
            }
        }
    }
}

I have tried escaping the hyphen with \\\\- but that returns results that match "auth": "-" . 我曾尝试使用\\\\-转义连字符,但返回的结果与"auth": "-"相匹配。 The hostname still does not match exactly. 主机名仍然不完全匹配。 The hostname query also matches something like "example-prod". 主机名查询还匹配“ example-prod”之类的内容。

I have tried using "term" rather than "match"; 我尝试使用“条件”而不是“匹配”; that returns no results. 没有返回结果。

I can match a specific string for "auth", for example "must": { "match": { "auth": "foo" } } returns all entries for auth = "foo". 我可以为“ auth”匹配一个特定的字符串,例如"must": { "match": { "auth": "foo" } }返回auth =“ foo”的所有条目。 That is opposite of what I need, but it does work. 那与我需要的相反,但是确实有效。 The hostname is still not exactly matched if it includes a hyphen. 如果主机名包含连字符,则仍不完全匹配。

The log entries are parsed into Elasticsearch using ELK stack, however this will be a report that is generated outside of Kibana for legacy reasons. 日志条目使用ELK堆栈解析到Elasticsearch中,但是由于传统原因,这将是在Kibana之外生成的报告。

I have read the documentation and examples, but there is a lot to dig through. 我已经阅读了文档和示例,但是有很多需要深入研究的地方。 Many of the examples I have found are for older versions of Elasticsearch, which is understandable, but confusing. 我发现的许多示例都是针对Elasticsearch的较旧版本的,虽然可以理解,但令人困惑。

I am new to Elasticsearch. 我是Elasticsearch的新手。 It feels like I am just overlooking something, but it the problem might stem from a basic misunderstanding of how Elasticsearch is doing things. 感觉就像我只是在忽略某些东西,但是问题可能出在对Elasticsearch的工作方式有一个基本的误解。

After spending some more time with ElascticSearch queries, I think I have it figured out. 在花了更多时间进行ElascticSearch查询之后,我想我已经解决了。

Splitting the hostname string into two separate string and matching for both filters the hostname as expected. 将主机名字符串拆分为两个单独的字符串,然后将两者匹配将按预期过滤主机名。 Using an empty string for the negative match also seems to work as expected. 使用空字符串进行否定匹配似乎也可以正常工作。

Here is the updated query: 这是更新的查询:

{
"query": {
    "bool": {
        "filter": {
            "bool": {
                "must": [
                    {
                        "match": {
                            "beat.hostname": "example"
                        }
                    },
                    {
                        "match": {
                            "beat.hostname": "dev"
                        }
                    }
                ],
                "must_not": [
                    {
                        "match_phrase": {
                            "auth.keyword": ""
                        }
                    }
                ]
            }
        }
    }
}

I will do bit more testing is need to make sure this is actually returning what I need. 我将做更多测试,以确保这实际上返回了我需要的东西。

I was trying too hard to make ElasticSearch fit what I expected. 我尽力使ElasticSearch符合我的期望。 Instead of working with ElasticSearch, I was trying to fight against it. 我没有与ElasticSearch合作,而是试图与之抗争。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM