簡體   English   中英

Fluentd: - 解析日志時正則表達式的問題

[英]Fluentd: - problem with regex while parsing log

我有這個fluentd配置:

<source>
   @type tail
   <parse>
   @type regexp
    expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
      time_format %d/%b/%Y:%H:%M:%S %z
      keep_time_key true
      types size:integer,reqtime:float,uct:float,uht:float,urt:float
   </parse>
   path /var/log/nginx/access.log
   pos_file /tmp/fluent_nginx.pos
   tag nginx
</source>

我的日志格式:

193.137.78.17 - - [07/Jan/2023:09:21:59 +0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.014
193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] "GET /net/api/employee HTTP/1.1" 200 2323 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36" 0.005

我已經在 regex101 上測試了我的正則表達式並且沒有問題。 不過,我在 fluentd 上收到了沒有模式匹配的警告。 我不明白為什么日志沒有被正確解析。

Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26 +0000 [warn]: #0 no patterns matched tag="nginx"

任何人都可以幫助我嗎? 謝謝!

我認為您的問題是日志中的前導空格

您的模式堅持<remote>之前沒有空格,但您的日志中在遠程 IP 之前確實有 4 個空格。

在我看來,最簡單的方法是在開頭插入一個可選的可變數量的空格。

^( )*(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*

怎么運行的

()只是為了讓閱讀代碼的人更輕松:他們會看到它們之間有一個空格字符,否則他們可能不會注意到。

*表示 0 個或多個。

這允許匹配和丟棄行開頭的 0 個或更多空格。

順便

我注意到你有時 escaping "\有時沒有。這是有原因的嗎?

您應該直接使用nginx 解析器插件

這是一個完整的示例輸入插件nginx解析器插件的工作示例:

流利的-nginx-test.conf

<source>
  @type sample
  sample [
    { "message": "193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
    { "message": "193.137.78.18 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
  ]
  rate 1
  size 2
  tag nginx
</source>

<filter nginx>
  @type parser
  key_name message
  <parse>
    @type nginx
  </parse>
</filter>

<match nginx>
  @type stdout
</match>

跑步

$ fluentd -c ./fluent-nginx-test.conf

Output

2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","method":"GET","path":"/net/api/employee","code":"200","size":"2323","referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","http_x_forwarded_for":"0.005"}

除此之外,我將您的正則表達式與正則表達式解析器插件一起使用,它也工作正常(盡管types字段中有冗余值):

流利的 nginx-test-with-regexp.conf

<source>
  @type sample
  sample [
    { "message": "193.137.78.17 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" },
    { "message": "193.137.78.18 - - [07/Jan/2023:09:22:00 +0000] \"GET /net/api/employee HTTP/1.1\" 200 2323 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36\" 0.005" }
  ]
  rate 1
  size 2
  tag nginx
</source>

<filter nginx>
  @type parser
  key_name message
  <parse>
    @type regexp
    expression /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] \"(?<method>\w+) (?<path>[^ ]*) (?<http>[^ ]*)" (?<status_code>[^ ]*) (?<size>[^ ]*)(?:\s"(?<referer>[^\"]*)") "(?<agent>[^\"]*)" (?<urt>[^\"]*).*/
    time_format %d/%b/%Y:%H:%M:%S %z
    keep_time_key true
    types size:integer,reqtime:float,uct:float,uht:float,urt:float
   </parse>
</filter>

<match nginx>
  @type stdout
</match>

跑步

$ fluentd -c ./fluent-nginx-test-with-regexp.conf

Output

2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.17","host":"-","user":"-","time":"07/Jan/2023:09:22:00 +0000","method":"GET","path":"/net/api/employee","http":"HTTP/1.1","status_code":"200","size":2323,"referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","urt":0.005}
2023-01-07 14:22:00.000000000 +0500 nginx: {"remote":"193.137.78.18","host":"-","user":"-","time":"07/Jan/2023:09:22:00 +0000","method":"GET","path":"/net/api/employee","http":"HTTP/1.1","status_code":"200","size":2323,"referer":"-","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36","urt":0.005}

但是,消息中no patterns matched tag="nginx"的錯誤:

Jan 07 09:26:26 srv-api fluentd[14878]: 2023-01-07 09:26:26 +0000 [warn]: #0 no patterns matched tag="nginx"

這意味着您的配置文件中沒有相應的match部分。 您必須有一個match部分,其中包含您要處理的相應tag或 output。

例子:

<source>
  @type tail
  # ...
  tag nginx
</source>

# ...

<match nginx>
  @type stdout
</match>

環境

  • fluentd
$ fluentd --version
fluentd 1.12.3
  • 操作系統
$ lsb_release -d
Description:    Ubuntu 18.04.6 LTS

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM