[英]Fluentd: - problem with regex while parsing log
我有这个fluentd
配置:
<source>
@type tail
<parse>
@type regexp
expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
time_format %d/%b/%Y:%H:%M:%S %z
keep_time_key true
types size:integer,reqtime:float,uct:float,uht:float,urt:float
</parse>
path /var/log/nginx/access.log
pos_file /tmp/fluent_nginx.pos
tag nginx
</source>
我的日志格式:
193.137.78.17 - - [07/Jan/2023:09:21:59 +0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.014
193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.005
我已经在 regex101 上测试了我的正则表达式并且没有问题。 不过,我在 fluentd 上收到了没有模式匹配的警告。 我不明白为什么日志没有被正确解析。
Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26 +0000 [warn]: #0 no patterns matched tag="nginx"
任何人都可以帮助我吗? 谢谢!
您的模式坚持<remote>
之前没有空格,但您的日志中在远程 IP 之前确实有 4 个空格。
在我看来,最简单的方法是在开头插入一个可选的可变数量的空格。
^( )*(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*
(
和)
只是为了让阅读代码的人更轻松:他们会看到它们之间有一个空格字符,否则他们可能不会注意到。
*
表示 0 个或多个。
这允许匹配和丢弃行开头的 0 个或更多空格。
我注意到你有时 escaping "
有\
有时没有。这是有原因的吗?
您应该直接使用nginx 解析器插件。
这是一个完整的示例输入插件和nginx解析器插件的工作示例:
流利的-nginx-test.conf
<source>
@type sample
sample [
{ "message": "193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
{ "message": "193.137.78.18 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
]
rate 1
size 2
tag nginx
</source>
<filter nginx>
@type parser
key_name message
<parse>
@type nginx
</parse>
</filter>
<match nginx>
@type stdout
</match>
跑步
$ fluentd -c ./fluent-nginx-test.conf
Output
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}
除此之外,我将您的正则表达式与正则表达式解析器插件一起使用,它也工作正常(尽管types
字段中有冗余值):
流利的 nginx-test-with-regexp.conf
<source>
@type sample
sample [
{ "message": "193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
{ "message": "193.137.78.18 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
]
rate 1
size 2
tag nginx
</source>
<filter nginx>
@type parser
key_name message
<parse>
@type regexp
expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
time_format %d/%b/%Y:%H:%M:%S %z
keep_time_key true
types size:integer,reqtime:float,uct:float,uht:float,urt:float
</parse>
</filter>
<match nginx>
@type stdout
</match>
跑步
$ fluentd -c ./fluent-nginx-test-with-regexp.conf
Output
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","time":"07/Jan/2023:09:22:00 +0000","method":"GET","path":"/net/api/employee","http":"HTTP/1.1","status_code":"200","size":2323,"referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","urt":0.005}
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","time":"07/Jan/2023:09:22:00 +0000","method":"GET","path":"/net/api/employee","http":"HTTP/1.1","status_code":"200","size":2323,"referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","urt":0.005}
但是,消息中no patterns matched tag="nginx"
的错误:
Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26 +0000 [warn]: #0 no patterns matched tag="nginx"
这意味着您的配置文件中没有相应的match
部分。 您必须有一个match
部分,其中包含您要处理的相应tag
或 output。
例子:
<source>
@type tail
# ...
tag nginx
</source>
# ...
<match nginx>
@type stdout
</match>
此外,您可能希望使用vscode-fluentd扩展来通过VS Code进行语法高亮显示。
环境
fluentd
$ fluentd --version
fluentd 1.12.3
$ lsb_release -d
Description: Ubuntu 18.04.6 LTS
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.