繁体   English   中英

Fluentd: - 解析日志时正则表达式的问题

[英]Fluentd: - problem with regex while parsing log

我有这个fluentd配置:

<source>
   @type tail
   <parse>
   @type regexp
    expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
      time_format %d/%b/%Y:%H:%M:%S %z
      keep_time_key true
      types size:integer,reqtime:float,uct:float,uht:float,urt:float
   </parse>
   path /var/log/nginx/access.log
   pos_file /tmp/fluent_nginx.pos
   tag nginx
</source>

我的日志格式:

193.137.78.17 - - [07/Jan/2023:09:21:59 +0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.014
193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.005

我已经在 regex101 上测试了我的正则表达式并且没有问题。 不过,我在 fluentd 上收到了没有模式匹配的警告。 我不明白为什么日志没有被正确解析。

Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26 +0000 [warn]: #0 no patterns matched tag="nginx"

任何人都可以帮助我吗? 谢谢!

我认为您的问题是日志中的前导空格

您的模式坚持<remote>之前没有空格,但您的日志中在远程 IP 之前确实有 4 个空格。

在我看来,最简单的方法是在开头插入一个可选的可变数量的空格。

^( )*(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*

怎么运行的

()只是为了让阅读代码的人更轻松:他们会看到它们之间有一个空格字符,否则他们可能不会注意到。

*表示 0 个或多个。

这允许匹配和丢弃行开头的 0 个或更多空格。

顺便

我注意到你有时 escaping "\有时没有。这是有原因的吗?

您应该直接使用nginx 解析器插件

这是一个完整的示例输入插件nginx解析器插件的工作示例:

流利的-nginx-test.conf

<source>
  @type sample
  sample [
    { "message": "193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
    { "message": "193.137.78.18 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
  ]
  rate 1
  size 2
  tag nginx
</source>

<filter nginx>
  @type parser
  key_name message
  <parse>
    @type nginx
  </parse>
</filter>

<match nginx>
  @type stdout
</match>

跑步

$ fluentd -c ./fluent-nginx-test.conf

Output

2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}

除此之外,我将您的正则表达式与正则表达式解析器插件一起使用,它也工作正常(尽管types字段中有冗余值):

流利的 nginx-test-with-regexp.conf

<source>
  @type sample
  sample [
    { "message": "193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
    { "message": "193.137.78.18 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
  ]
  rate 1
  size 2
  tag nginx
</source>

<filter nginx>
  @type parser
  key_name message
  <parse>
    @type regexp
    expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
    time_format %d/%b/%Y:%H:%M:%S %z
    keep_time_key true
    types size:integer,reqtime:float,uct:float,uht:float,urt:float
   </parse>
</filter>

<match nginx>
  @type stdout
</match>

跑步

$ fluentd -c ./fluent-nginx-test-with-regexp.conf

Output

2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","time":"07/Jan/2023:09:22:00 +0000","method":"GET","path":"/net/api/employee","http":"HTTP/1.1","status_code":"200","size":2323,"referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","urt":0.005}
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","time":"07/Jan/2023:09:22:00 +0000","method":"GET","path":"/net/api/employee","http":"HTTP/1.1","status_code":"200","size":2323,"referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","urt":0.005}

但是,消息中no patterns matched tag="nginx"的错误:

Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26 +0000 [warn]: #0 no patterns matched tag="nginx"

这意味着您的配置文件中没有相应的match部分。 您必须有一个match部分,其中包含您要处理的相应tag或 output。

例子:

<source>
  @type tail
  # ...
  tag nginx
</source>

# ...

<match nginx>
  @type stdout
</match>

环境

  • fluentd
$ fluentd --version
fluentd 1.12.3
  • 操作系统
$ lsb_release -d
Description:    Ubuntu 18.04.6 LTS

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM