[英]logstash: Want to match against “single grok pattern” and “multiple grok patterns” in same filter
我的日志文件行如下所示。
[2020-07-10T10:00:04.979+00:00] [app_server1] [NOTIFICATION] [report-thread] [tid: 1346887] [userId: user10.id2] start-getchunk: Report=/Report Folder Path/Report name, TemplateName=Template1004, OutFormat=excel, Locale=en_US
[2020-07-10T10:00:25.085+00:00] [app_server2] [NOTIFICATION] [report-thread] [tid: 1346887] [userId: user1.id1] end-getchunk: Report=/Report Folder Path/Report name, TemplateName=Template2007, OutFormat=html, Locale=en_US
[2020-07-10T10:00:25.080+00:00] [app_server2] [NOTIFICATION] [report-thread] [tid: 1346887] [userId: user2.id1][
Start report processing details
-----------------------------------------------------
Report path: /Report Folder Path/Report name
Datamodel name: /Report Folder Path/Datamodel name
User name: user2
Output format: 1
chunk size limit: 524288000
-----------------------------------------------------
End report processing details
]
[Log lines with some other patterns]
我想在我的 logstash 配置文件中有以下类型的匹配模式来匹配不同的行。
(a) match => { "message" => "\[%{TIMESTAMP_ISO8601:process_timstamp}\] \[%{WORD:app_server}] \[%{WORD:log_level}] \[] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\] (?<event_type>\sstart-getchunk): Report=(?<report_name>[^,]*), TemplateName=(?<template_name>[^,]*), OutFormat=(?<output_format>[^,]*), Locale=(?<locale>[^,]*)" }
(b) match => { "message" => "\[%{TIMESTAMP_ISO8601:process_timstamp}\] \[%{WORD:app_server}] \[%{WORD:log_level}] \[] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\] (?<event_type>\send-getchunk): Report=(?<report_name>[^,]*), TemplateName=(?<template_name>[^,]*), OutFormat=(?<output_format>[^,]*), Locale=(?<locale>[^,]*)" }
(c) to match a multiline having this match
break_on_match => false
match => {
"message => [
"Report path: (?<report_path>[^,\r\n]*)",
"Datamodel name: (?<datamodel_name>[^,\r\n]*)",
"User name: (?<user_name>[^,\r\n]*)",
(?<event_type>Start report processing details)
]
}
(d) other n numbers of matches like the one mentioned in points (a) & (b) above
现在,让我们假设我的配置中只有 (a)、(b) 和 (c)。
我的配置看起来与下面类似(“match =>”行的顺序发生了变化)。
filter{
grok{
break_on_match => false
## single grok pattern match
match => { "message" => "\[%{TIMESTAMP_ISO8601:process_timstamp}\] \[%{WORD:app_server}] \[%{WORD:log_level}] \[] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\] (?<event_type>\sstart-getchunk): Report=(?<report_name>[^,]*), TemplateName=(?<template_name>[^,]*), OutFormat=(?<output_format>[^,]*), Locale=(?<locale>[^,]*)" }
match => { "message" => "\[%{TIMESTAMP_ISO8601:process_timstamp}\] \[%{WORD:app_server}] \[%{WORD:log_level}] \[] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\] (?<event_type>\send-getchunk): Report=(?<report_name>[^,]*), TemplateName=(?<template_name>[^,]*), OutFormat=(?<output_format>[^,]*), Locale=(?<locale>[^,]*)" }
## multple grok pattern match
match => {
"message => [
"Report path: (?<report_path>[^,\r\n]*)",
"Datamodel name: (?<datamodel_name>[^,\r\n]*)",
"User name: (?<user_name>[^,\r\n]*)",
(?<event_type>Start report processing details)
]
}
}
}
现在问题陈述: -配置仅匹配配置文件中提到的三个“match =>”中的最后一个的模式。 所以,如果在配置文件中
[A] the sequence of "match =>" lines are (a) -> (b) -> (c) (as shown in above config): it matches patterns for only (c).
[B] the sequence of "match =>" lines are (c) -> (b) -> (a): it matches patterns for only (a).
[C] the sequence of "match =>" lines are (c) -> (a) -> (b): it matches patterns for only (b).
但是,我希望在输入中匹配所有 (a)、(b)、(c)。
另外,我想保持 (a)、(b)、(c) 的匹配模式。 实际上,对于(c)我有一个像(a)和(b)这样的grok模式。 当时配置试图匹配所有 (a)、(b) 和 (c)。 但是由于 (c) 的“_groktimeout”错误,将模式更改为 (c) 此处Ref 中提到的模式。
为了解决这个问题,我尝试了
我尝试了另一种选择,如下所示。 这有效,但这不是一个性能高效的选项。 因此,在不使用 "if" condition 中的 grok 的情况下寻找更好、更高效的选项。
filter{
grok{
## single grok pattern match
match => { "message" => "\[%{TIMESTAMP_ISO8601:process_timstamp}\] \[%{WORD:app_server}] \[%{WORD:log_level}] \[] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\] (?<event_type>\sstart-getchunk): Report=(?<report_name>[^,]*), TemplateName=(?<template_name>[^,]*), OutFormat=(?<output_format>[^,]*), Locale=(?<locale>[^,]*)" }
match => { "message" => "\[%{TIMESTAMP_ISO8601:process_timstamp}\] \[%{WORD:app_server}] \[%{WORD:log_level}] \[] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\] (?<event_type>\send-getchunk): Report=(?<report_name>[^,]*), TemplateName=(?<template_name>[^,]*), OutFormat=(?<output_format>[^,]*), Locale=(?<locale>[^,]*)" }
match => { "message" => "\[%{TIMESTAMP_ISO8601:process_timstamp}\] \[%{WORD:app_server}] \[%{WORD:log_level}] %{GREEDYDATA}(?<event_type>Start report processing details)"}
if " Start report processing details" in [event_type]{
grok {
break_on_match => false
## multple grok pattern match
match => {
"message => [
"Report path: (?<report_path>[^,\r\n]*)",
"Datamodel name: (?<datamodel_name>[^,\r\n]*)",
"User name: (?<user_name>[^,\r\n]*)",
(?<event_type>Start report processing details)
]
}
}
}
}
}
我的预期输出如下所示,为此我删除了配置文件中的几个字段。
{"tags":["reportlog"],"userId":"user10.id2","process_timestamp":"2020-07-10T10:00:25.085+00:00","thread_type":"report-thread","template_name":"Template2007","log_level":"NOTIFICATION","event_type":"end-getchunk","output_format":"html","tid":"199163","app_server":"app_server2","report_name":"/Report Folder Path/Report name"}
{"tags":["reportlog"],"userId":"user1.id1","process_timestamp":"2020-07-10T10:00:04.979+00:00","thread_type":"report-thread","template_name":"Template1004","log_level":"NOTIFICATION","event_type":"start-getchunk","output_format":"excel","tid":"800760","app_server":"app_server1","report_name":"/Report Folder Path/Report name"}
{"tags":["multiline","reportlog"],"userId":["user2.id1"],"report_path":"/Report Folder Path/Report name", "process_timestamp":["2020-07-10T10:00:25.080+00:00"],"datamodel_name":"/Report Folder Path/Datamodel name","thread_type":"report-thread","log_level":"NOTIFICATION","event_type":"Start report processing details","tid":["812409"],"app_server":"app_server2"}
. [1]: https : //discuss.elastic.co/t/getting-groktimeout-error-for-a-particular-filter-sometimes-need-help-optimizing-the-filter/246342
前两条消息之间的唯一区别是,一条具有字符串start-getchunck
,另一条具有字符串end-getchunck
并且您将此字符串保存到两种类型消息的相同字段中,因此您的 grok 模式a和b是基本一样。
以下模式将匹配这两种类型的消息。
\[%{TIMESTAMP_ISO8601:process_timestamp}\] \[%{WORD:app_server}\] \[%{WORD:log_level}] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\] %{DATA:event_type}: Report=(?<report_name>[^,]*), TemplateName=(?<template_name>[^,]*), OutFormat=(?<output_format>[^,]*), Locale=(?<locale>[^,]*)
而不是使用(?<event_type>\\sstart-getchunk):
和(?<event_type>\\send-getchunk):
,使用%{DATA:event_type}:
。
对于您的多行日志,由于您没有共享您的input
配置,我使用以下多行代码配置来重现。
codec => multiline {
pattern => "^\["
negate => true
what => "previous"
}
您为此日志使用的grok
配置是错误的。
有这个:
grok {
match => {
"message" => [
"Report path: (?<report_path>[^,\r\n]*)",
"Datamodel name: (?<datamodel_name>[^,\r\n]*)"
]
}
}
与此配置相同:
grok {
match => { "message" => "Report path: (?<report_path>[^,\r\n]*)" }
match => { "message" => "Datamodel name: (?<datamodel_name>[^,\r\n]*)" }
}
您需要的是匹配消息中所有字段的模式,下面的模式将匹配您的多行日志。
\[%{TIMESTAMP_ISO8601:process_timestamp}\] \[%{WORD:app_server}\] \[%{WORD:log_level}] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\]%{DATA}(?<event_type>Start report processing details)%{DATA}Report path: (?<report_path>[^,\r\n]*)%{DATA}Datamodel name: (?<datamodel_name>[^,\r\n]*)%{DATA}User name: (?<user_name>[^,\r\n]*)"
此外,您需要将break_on_match
设置为true
,如果您的线路已经具有模式匹配,则无需针对其他模式对其进行测试,它只会为您的管道添加更多处理。
以下filter
将匹配您的所有示例行。
filter{
grok {
break_on_match => true
match => { "message" => "^\[%{TIMESTAMP_ISO8601:process_timestamp}\] \[%{WORD:app_server}\] \[%{WORD:log_level}] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\] %{DATA:event_type}: Report=(?<report_name>[^,]*), TemplateName=(?<template_name>[^,]*), OutFormat=(?<output_format>[^,]*), Locale=(?<locale>[^,]*)" }
match => { "message" => "^\[%{TIMESTAMP_ISO8601:process_timestamp}\] \[%{WORD:app_server}\] \[%{WORD:log_level}] \[%{DATA:thread_type}\] \[tid: %{NUMBER:tid}\] \[userId: %{DATA:userId}\]%{DATA}(?<event_type>Start report processing details)%{DATA}Report path: (?<report_path>[^,\r\n]*)%{DATA}Datamodel name: (?<datamodel_name>[^,\r\n]*)%{DATA}User name: (?<user_name>[^,\r\n]*)" }
}
}
模式中的^
是锚定您的消息的位置,如果由于某种原因您的消息不以[
开头,则grok
甚至不会尝试解析它。
由于grok
依赖于正则表达式,它可能会出现性能问题,如果您仍然继续收到超时错误,最好更改您的方法并开始使用条件来解析某些消息。
您可以尝试使用一个grok
解析所有公共字段,将消息的不同部分保存在另一个字段中,并使用条件将该字段定向到正确的grok
,这比让一个grok
尝试匹配所有内容的效果更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.