简体   繁体   English

在 Logstash 中,如何限制在 Elasticsearch 中转换为索引字段的日志中 JSON 属性的深度?

[英]In Logstash, how do I limit the depth of JSON properties in my logs that are turned into Index fields in Elasticsearch?

I'm fairly new to the Elastic Stack.我是 Elastic Stack 的新手。 I'm using Logstash 6.4.0 to load JSON log data from Filebeat 6.4.0 into Elasticsearch 6.4.0.. I'm finding that I'm getting way too many JSON properties converted into fields once I start using Kibana 6.4.0.我正在使用 Logstash 6.4.0 将 JSON 日志数据从 Filebeat 6.4.0 加载到 Elasticsearch 6.4.0.. 我发现一旦我开始使用 Kibana 6.4.0,我就会将太多 JSON 属性转换为字段.

I know this because when I navigate to Kibana Discover and put in my index of logstash-* , I'm getting an error message that states:我知道这一点是因为当我导航到 Kibana Discover 并输入我的logstash-*索引时,我收到一条错误消息,指出:

Discover: Trying to retrieve too many docvalue_fields.发现:试图检索太多 docvalue_fields。 Must be less than or equal to: [100] but was [106].必须小于或等于:[100] 但为 [106]。 This limit can be set by changing the [index.max_docvalue_fields_search] index level setting.可以通过更改 [index.max_docvalue_fields_search] 索引级别设置来设置此限制。

If I navigate to Management > Kibana > Index Patterns I see that I have 940 fields.如果我导航到Management > Kibana > Index Patterns我会看到我有 94​​0 个字段。 It appears that each child property of my root JSON object (and many of those child properties have JSON objects as values, and so on) is automatically being parsed and used to create fields in my Elasticsearch logstash-* index.看起来我的根 JSON 对象的每个子属性(以及许多这些子属性都将 JSON 对象作为值,等等)正在自动解析并用于在我的 Elasticsearch logstash-*索引中创建字段。

So here's my question – how can I limit this automatic creation?所以这是我的问题——我怎样才能限制这种自动创建? Is it possible to do this by property depth?是否可以通过属性深度来做到这一点? Is it possible to do this some other way?有没有可能以其他方式做到这一点?

Here is my Filebeat configuration (minus the comments):这是我的 Filebeat 配置(减去注释):

filebeat.inputs:
- type: log
  enabled: true
  paths:
  - d:/clients/company-here/rpms/logs/rpmsdev/*.json
  json.keys_under_root: true
  json.add_error_key: true

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.template.settings:
  index.number_of_shards: 3

setup.kibana:

output.logstash:
  hosts: ["localhost:5044"]

Here is my current Logstash pipeline configuration:这是我当前的 Logstash 管道配置:

input {
    beats {
        port => "5044"
    }
}
filter {
    date {
        match => [ "@timestamp" , "ISO8601"]
    }
}
output {
    stdout { 
        #codec => rubydebug 
    }
    elasticsearch {
        hosts => [ "localhost:9200" ]
    }
}

Here is an example of a single log message that I am shipping (one row of my log file) – note that the JSON is completely dynamic and can change depending on what's being logged:这是我要发送的单个日志消息的示例(我的日志文件的一行)——请注意,JSON 是完全动态的,并且可以根据记录的内容而变化:

{
    "@timestamp": "2018-09-06T14:29:32.128",
    "level": "ERROR",
    "logger": "RPMS.WebAPI.Filters.LogExceptionAttribute",
    "message": "Log Exception: RPMS.WebAPI.Entities.LogAction",
    "eventProperties": {
        "logAction": {
            "logActionId": 26268916,
            "performedByUserId": "b36778be-6181-4b69-a0fe-e3a975ddcdd7",
            "performedByUserName": "test.sga.danny@domain.net",
            "performedByFullName": "Mike Manley",
            "controller": "RpmsToMainframeOperations",
            "action": "UpdateStoreItemPricing",
            "actionDescription": "Exception while updating store item pricing for store item with storeItemId: 146926. An error occurred while sending the request. InnerException: Unable to connect to the remote server InnerException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond 10.1.1.133:8800",
            "url": "http://localhost:49399/api/RpmsToMainframeOperations/UpdateStoreItemPricing/146926",
            "verb": "PUT",
            "statusCode": 500,
            "status": "Internal Server Error - Exception",
            "request": {
                "itemId": 648,
                "storeId": 13,
                "storeItemId": 146926,
                "changeType": "price",
                "book": "C",
                "srpCode": "",
                "multi": 0,
                "price": "1.27",
                "percent": 40,
                "keepPercent": false,
                "keepSrp": false
            },
            "response": {
                "exception": {
                    "ClassName": "System.Net.Http.HttpRequestException",
                    "Message": "An error occurred while sending the request.",
                    "Data": null,
                    "InnerException": {
                        "ClassName": "System.Net.WebException",
                        "Message": "Unable to connect to the remote server",
                        "Data": null,
                        "InnerException": {
                            "NativeErrorCode": 10060,
                            "ClassName": "System.Net.Sockets.SocketException",
                            "Message": "A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond",
                            "Data": null,
                            "InnerException": null,
                            "HelpURL": null,
                            "StackTraceString": "   at System.Net.Sockets.Socket.InternalEndConnect(IAsyncResult asyncResult)\r\n   at System.Net.Sockets.Socket.EndConnect(IAsyncResult asyncResult)\r\n   at System.Net.ServicePoint.ConnectSocketInternal(Boolean connectFailure, Socket s4, Socket s6, Socket& socket, IPAddress& address, ConnectSocketState state, IAsyncResult asyncResult, Exception& exception)",
                            "RemoteStackTraceString": null,
                            "RemoteStackIndex": 0,
                            "ExceptionMethod": "8\nInternalEndConnect\nSystem, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089\nSystem.Net.Sockets.Socket\nVoid InternalEndConnect(System.IAsyncResult)",
                            "HResult": -2147467259,
                            "Source": "System",
                            "WatsonBuckets": null
                        },
                        "HelpURL": null,
                        "StackTraceString": "   at System.Net.HttpWebRequest.EndGetRequestStream(IAsyncResult asyncResult, TransportContext& context)\r\n   at System.Net.Http.HttpClientHandler.GetRequestStreamCallback(IAsyncResult ar)",
                        "RemoteStackTraceString": null,
                        "RemoteStackIndex": 0,
                        "ExceptionMethod": "8\nEndGetRequestStream\nSystem, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089\nSystem.Net.HttpWebRequest\nSystem.IO.Stream EndGetRequestStream(System.IAsyncResult, System.Net.TransportContext ByRef)",
                        "HResult": -2146233079,
                        "Source": "System",
                        "WatsonBuckets": null
                    },
                    "HelpURL": null,
                    "StackTraceString": "   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()\r\n   at RPMS.WebAPI.Infrastructure.RpmsToMainframe.RpmsToMainframeOperationsManager.<PerformOperationInternalAsync>d__14.MoveNext() in D:\\Century\\Clients\\PigglyWiggly\\RPMS\\PWADC.RPMS\\RPMSDEV\\RPMS.WebAPI\\Infrastructure\\RpmsToMainframe\\RpmsToMainframeOperationsManager.cs:line 114\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()\r\n   at RPMS.WebAPI.Infrastructure.RpmsToMainframe.RpmsToMainframeOperationsManager.<PerformOperationAsync>d__13.MoveNext() in D:\\Century\\Clients\\PigglyWiggly\\RPMS\\PWADC.RPMS\\RPMSDEV\\RPMS.WebAPI\\Infrastructure\\RpmsToMainframe\\RpmsToMainframeOperationsManager.cs:line 96\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()\r\n   at RPMS.WebAPI.Controllers.RpmsToMainframe.RpmsToMainframeOperationsController.<UpdateStoreItemPricing>d__43.MoveNext() in D:\\Century\\Clients\\PigglyWiggly\\RPMS\\PWADC.RPMS\\RPMSDEV\\RPMS.WebAPI\\Controllers\\RpmsToMainframe\\RpmsToMainframeOperationsController.cs:line 537\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Threading.Tasks.TaskHelpersExtensions.<CastToObject>d__1`1.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Controllers.ApiControllerActionInvoker.<InvokeActionAsyncCore>d__1.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Web.Http.Filters.ActionFilterAttribute.<CallOnActionExecutedAsync>d__6.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Filters.ActionFilterAttribute.<ExecuteActionFilterAsyncCore>d__5.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Controllers.ActionFilterResult.<ExecuteAsync>d__5.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Filters.AuthorizationFilterAttribute.<ExecuteAuthorizationFilterAsyncCore>d__3.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Controllers.AuthenticationFilterResult.<ExecuteAsync>d__5.MoveNext()\r\n--- End of stack trace from previous location where exception was thrown ---\r\n   at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)\r\n   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n   at System.Web.Http.Controllers.ExceptionFilterResult.<ExecuteAsync>d__6.MoveNext()",
                    "RemoteStackTraceString": null,
                    "RemoteStackIndex": 0,
                    "ExceptionMethod": "8\nThrowForNonSuccess\nmscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089\nSystem.Runtime.CompilerServices.TaskAwaiter\nVoid ThrowForNonSuccess(System.Threading.Tasks.Task)",
                    "HResult": -2146233088,
                    "Source": "mscorlib",
                    "WatsonBuckets": null,
                    "SafeSerializationManager": {
                        "m_serializedStates": [{

                        }]
                    },
                    "CLR_SafeSerializationManager_RealType": "System.Net.Http.HttpRequestException, System.Net.Http, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a"
                }
            },
            "performedAt": "2018-09-06T14:29:32.1195316-05:00"
        }
    },
    "logAction": "RPMS.WebAPI.Entities.LogAction"
}

I never ultimately found a way to limit the depth of the automatic field creation.我最终没有找到一种方法来限制自动创建字段的深度。 I also posted my question in the Elastic forums and never got an answer.我还在Elastic 论坛上发布了我的问题,但从未得到答案。 Between the time of my post and now, I have learned a lot more about Logstash.从我发帖到现在,我对 Logstash 有了更多的了解。

My ultimate solution was to extract the JSON properties that I needed as fields and then I used the GREEDYDATA pattern In a grok filter to place the rest of the properties into an unextractedJson field so that I could still query for values within that field in Elasticsearch.我的最终解决方案是提取JSON属性我需要作为字段,然后我用GREEDYDATA图案在grok滤波器应用于所述属性的其余部分放入一个unextractedJson字段使得我仍然可以查询在Elasticsearch该字段内的值。

Here is my new Filebeat configuration (minus the comments):这是我的新 Filebeat 配置(减去注释):

filebeat.inputs:
- type: log
  enabled: true
  paths:
  - d:/clients/company-here/rpms/logs/rpmsdev/*.json
  #json.keys_under_root: true
  json.add_error_key: true

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false

setup.template.settings:
  index.number_of_shards: 3

setup.kibana:

output.logstash:
  hosts: ["localhost:5044"]

Note that I commented out the json.keys_under_root setting which tells Filebeat to place the JSON formatted log entry into a json field that is sent on to Logstash.请注意,我注释掉了json.keys_under_root设置,该设置告诉 Filebeat 将 JSON 格式的日志条目放入发送到 Logstash 的json字段中。

Here is a snippet of my new Logstash pipeline configuration:这是我的新 Logstash 管道配置的片段:

#...

filter {

    ###########################################################################
    # common date time extraction
    date {
        match => ["[json][time]", "ISO8601"]
        remove_field => ["[json][time]"]
    }

    ###########################################################################
    # configuration for the actions log
    if [source] =~ /actionsCurrent.json/ {

        if ("" in [json][eventProperties][logAction][performedByUserName]) {
            mutate {
                add_field => {
                    "performedByUserName" => "%{[json][eventProperties][logAction][performedByUserName]}"
                    "performedByFullName" => "%{[json][eventProperties][logAction][performedByFullName]}"
                }
                remove_field => [
                    "[json][eventProperties][logAction][performedByUserName]", 
                    "[json][eventProperties][logAction][performedByFullName]"]
            }
        }

        mutate {
            add_field => {
                "logFile" => "actions"
                "logger" => "%{[json][logger]}"
                "level" => "%{[json][level]}"
                "performedAt" => "%{[json][eventProperties][logAction][performedAt]}"
                "verb" => "%{[json][eventProperties][logAction][verb]}"
                "url" => "%{[json][eventProperties][logAction][url]}"
                "controller" => "%{[json][eventProperties][logAction][controller]}"
                "action" => "%{[json][eventProperties][logAction][action]}"
                "actionDescription" => "%{[json][eventProperties][logAction][actionDescription]}"
                "statusCode" => "%{[json][eventProperties][logAction][statusCode]}"
                "status" => "%{[json][eventProperties][logAction][status]}"
            }
            remove_field => [
                "[json][logger]",
                "[json][level]",
                "[json][eventProperties][logAction][performedAt]",
                "[json][eventProperties][logAction][verb]",
                "[json][eventProperties][logAction][url]",
                "[json][eventProperties][logAction][controller]",
                "[json][eventProperties][logAction][action]",
                "[json][eventProperties][logAction][actionDescription]",
                "[json][eventProperties][logAction][statusCode]",
                "[json][eventProperties][logAction][status]",
                "[json][logAction]",
                "[json][message]"
            ]
        }

        mutate {
            convert => {
                "statusCode" => "integer"
            }
        }

        grok {
            match => { "json" => "%{GREEDYDATA:unextractedJson}" }
            remove_field => ["json"]
        }

    }

# ...

Note the add_field configuration options in the mutate commands that extract the properties into named fields followed by the remove_field configuration options that removes those properties from the JSON.请注意将属性提取到命名字段中的mutate命令中的add_field配置选项,然后是从 JSON 中删除这些属性的remove_field配置选项。 At the end of the filter snippet, notice the grok command that gobbles up the rest of the JSON and places it in the unextractedJson field.在过滤器片段的末尾,请注意grok命令,该命令吞噬了 JSON 的其余部分并将其放置在unextractedJson字段中。 Finally, and all importantly, I remove the json field that was provided by Filebeat.最后,最重要的是,我删除了 Filebeat 提供的json字段。 That last bit saves me from exposing all that JSON data to Elasticsearch/Kibana.最后一点让我免于将所有 JSON 数据暴露给 Elasticsearch/Kibana。

This solution takes log entries that look like this:此解决方案采用如下所示的日志条目:

{ "time": "2018-09-13T13:36:45.376", "level": "DEBUG", "logger": "RPMS.WebAPI.Filters.LogActionAttribute", "message": "Log Action: RPMS.WebAPI.Entities.LogAction", "eventProperties": {"logAction": {"logActionId":26270372,"performedByUserId":"83fa1d72-fac2-4184-867e-8c2935a262e6","performedByUserName":"rpmsadmin@domain.net","performedByFullName":"Super Admin","clientIpAddress":"::1","controller":"Account","action":"Logout","actionDescription":"Logout.","url":"http://localhost:49399/api/Account/Logout","verb":"POST","statusCode":200,"status":"OK","request":null,"response":null,"performedAt":"2018-09-13T13:36:45.3707739-05:00"}}, "logAction": "RPMS.WebAPI.Entities.LogAction" }

And turns them into Elasticsearch indexes that look like this:并将它们转换为如下所示的 Elasticsearch 索引:

{
  "_index": "actions-2018.09.13",
  "_type": "doc",
  "_id": "xvA41GUBIzzhuC5epTZG",
  "_version": 1,
  "_score": null,
  "_source": {
    "level": "DEBUG",
    "tags": [
      "beats_input_raw_event"
    ],
    "@timestamp": "2018-09-13T18:36:45.376Z",
    "status": "OK",
    "unextractedJson": "{\"eventProperties\"=>{\"logAction\"=>{\"performedByUserId\"=>\"83fa1d72-fac2-4184-867e-8c2935a262e6\", \"logActionId\"=>26270372, \"clientIpAddress\"=>\"::1\"}}}",
    "action": "Logout",
    "source": "d:\\path\\actionsCurrent.json",
    "actionDescription": "Logout.",
    "offset": 136120,
    "@version": "1",
    "verb": "POST",
    "statusCode": 200,
    "controller": "Account",
    "performedByFullName": "Super Admin",
    "logger": "RPMS.WebAPI.Filters.LogActionAttribute",
    "input": {
      "type": "log"
    },
    "url": "http://localhost:49399/api/Account/Logout",
    "logFile": "actions",
    "host": {
      "name": "Development5"
    },
    "prospector": {
      "type": "log"
    },
    "performedAt": "2018-09-13T13:36:45.3707739-05:00",
    "beat": {
      "name": "Development5",
      "hostname": "Development5",
      "version": "6.4.0"
    },
    "performedByUserName": "rpmsadmin@domain.net"
  },
  "fields": {
    "@timestamp": [
      "2018-09-13T18:36:45.376Z"
    ],
    "performedAt": [
      "2018-09-13T18:36:45.370Z"
    ]
  },
  "sort": [
    1536863805376
  ]
}

The depth limit can be set per index directly in elastic search.可以直接在弹性搜索中为每个索引设置深度限制。

ElascticSearch Field Mapping documentation : https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings ElasticSearch 字段映射文档: https ://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html#mapping-limit-settings

From the docs :从文档:

index.mapping.depth.limit The maximum depth for a field, which is measured as the number of inner objects. index.mapping.depth.limit的最大深度,以内部对象的数量来衡量。 For instance, if all fields are defined at the root object level, then the depth is 1. If there is one object mapping, then the depth is 2, etc. Default is 20.例如,如果所有字段都定义在根对象级别,则深度为 1。如果有一个对象映射,则深度为 2,以此类推。默认为 20。

Related answer : Limiting the nested fields in Elasticsearch相关答案: 限制 Elasticsearch 中的嵌套字段

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM