繁体   English   中英

logstash grok,用 json 过滤器解析一行

[英]logstash grok, parse a line with json filter

我正在使用 ELK(弹性搜索、kibana、logstash、filebeat)来收集日志。 我有一个包含以下行的日志文件,每行都有一个 json,我的目标是使用 Logstash Grok 取出 json 中的键/值对并将其转发到弹性搜索。

2018-03-28 13:23:01  charge:{"oldbalance":5000,"managefee":0,"afterbalance":"5001","cardid":"123456789","txamt":1}

2018-03-28 13:23:01  manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}

我正在使用Grok Debugger制作正则表达式模式并查看结果。 我当前的正则表达式是:

%{TIMESTAMP_ISO8601} %{SPACE} %{WORD:$:data}:{%{QUOTEDSTRING:key1}:%{BASE10NUM:value1}[,}]%{QUOTEDSTRING:key2}:%{BASE10NUM:value2}[,}]%{QUOTEDSTRING:key3}:%{QUOTEDSTRING:value3}[,}]%{QUOTEDSTRING:key4}:%{QUOTEDSTRING:value4}[,}]%{QUOTEDSTRING:key5}:%{BASE10NUM:value5}[,}]

可以看到它是硬编码的,因为真实日志中json中的键可以是任何单词,值可以是integer,double或string,而且键的长度是变化的。 所以我的解决方案是不可接受的。 我的解决结果如下所示,仅供参考。 我正在使用Grok 模式

我的问题是,尝试提取 json 中的密钥是否明智,因为弹性搜索也使用 json? 其次,如果我尝试从 json 中取出键/值,是否有正确、简洁的 Grok 模式?

Grok 模式的当前结果在解析上述行中的第一行时给出以下 output。

{
  "TIMESTAMP_ISO8601": [
    [
      "2018-03-28 13:23:01"
    ]
  ],
  "YEAR": [
    [
      "2018"
    ]
  ],
  "MONTHNUM": [
    [
      "03"
    ]
  ],
  "MONTHDAY": [
    [
      "28"
    ]
  ],
  "HOUR": [
    [
      "13",
      null
    ]
  ],
  "MINUTE": [
    [
      "23",
      null
    ]
  ],
  "SECOND": [
    [
      "01"
    ]
  ],
  "ISO8601_TIMEZONE": [
    [
      null
    ]
  ],
  "SPACE": [
    [
      ""
    ]
  ],
  "WORD": [
    [
      "charge"
    ]
  ],
  "key1": [
    [
      ""oldbalance""
    ]
  ],
  "value1": [
    [
      "5000"
    ]
  ],
  "key2": [
    [
      ""managefee""
    ]
  ],
  "value2": [
    [
      "0"
    ]
  ],
  "key3": [
    [
      ""afterbalance""
    ]
  ],
  "value3": [
    [
      ""5001""
    ]
  ],
  "key4": [
    [
      ""cardid""
    ]
  ],
  "value4": [
    [
      ""123456789""
    ]
  ],
  "key5": [
    [
      ""txamt""
    ]
  ],
  "value5": [
    [
      "1"
    ]
  ]
}

第二次编辑

是否可以使用 Logstash 的 Json 过滤器? 但在我的例子中,Json 是线路/事件的一部分,而不是整个事件都是 Json。

================================================ =========

第三版

我没有看到更新的解决方案功能很好地解析 json。我的正则表达式如下:

filter {
  grok {
    match => {
      "message" => [
           "%{TIMESTAMP_ISO8601}%{SPACE}%{GREEDYDATA:json_data}"
            ]
    }       
  }
}


filter {
  json{
    source => "json_data"
    target => "parsed_json"
  } 
}

它没有键值对,而是 msg+json 字符串。 解析出来的json没有解析。

测试数据如下:

2018-03-28 13:23:01  manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}
2018-03-28 13:23:03  payment:{"cuurentValue":5001,"reload":0,"newbalance":"5002","posid":"987654321","something":"new3","additionalFields":2}
2018-03-28 13:24:07  management:{"cuurentValue":5002,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}

[2018-06-04T15:01:30,017][WARN ][logstash.filters.json    ] Error parsing json {:source=>"json_data", :raw=>"manage:{\"cuurentValue\":5000,\"payment\":0,\"newbalance\":\"5001\",\"posid\":\"123456789\",\"something\":\"new2\",\"additionalFields\":1}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'manage': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"manage:{"cuurentValue":5000,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}"; line: 1, column: 8]>}
[2018-06-04T15:01:30,017][WARN ][logstash.filters.json    ] Error parsing json {:source=>"json_data", :raw=>"payment:{\"cuurentValue\":5001,\"reload\":0,\"newbalance\":\"5002\",\"posid\":\"987654321\",\"something\":\"new3\",\"additionalFields\":2}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'payment': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"payment:{"cuurentValue":5001,"reload":0,"newbalance":"5002","posid":"987654321","something":"new3","additionalFields":2}"; line: 1, column: 9]>}
[2018-06-04T15:01:34,986][WARN ][logstash.filters.json    ] Error parsing json {:source=>"json_data", :raw=>"management:{\"cuurentValue\":5002,\"payment\":0,\"newbalance\":\"5001\",\"posid\":\"123456789\",\"something\":\"new2\",\"additionalFields\":1}", :exception=>#<LogStash::Json::ParserError: Unrecognized token 'management': was expecting ('true', 'false' or 'null')
 at [Source: (byte[])"management:{"cuurentValue":5002,"payment":0,"newbalance":"5001","posid":"123456789","something":"new2","additionalFields":1}"; line: 1, column: 12]>}

请检查结果: 在此处输入图像描述

您可以使用GREEDYDATA将 json 的整个块分配给这样的单独字段,

%{TIMESTAMP_ISO8601}%{SPACE}%{GREEDYDATA:json_data}

这将为您的 json 数据创建一个单独的文件,

{
  "TIMESTAMP_ISO8601": [
    [
      "2018-03-28 13:23:01"
    ]
  ],
  "json_data": [
    [
      "charge:{"oldbalance":5000,"managefee":0,"afterbalance":"5001","cardid":"123456789","txamt":1}"
    ]
  ]
}

然后在json_data字段上应用json 过滤器,如下所示,

json{
    source => "json_data"
    target => "parsed_json"
} 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM