[英]Parse Nginx Ingress Access Log in FluentD Using Multi Format Parser (Regex)
我在 K8S 集群中有一个 Nginx 入口控制器,它具有以下日志格式(我从/etc/nginx/nginx.conf
中的/etc/nginx/nginx.conf
它):
log_format upstreaminfo '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id';
我的目标是解析 Nginx 日志并将其推送到 CW。 请注意,Nginx 日志文件包含 Nginx 应用程序日志(例如信息和警告日志)以及访问日志。 我的理解是我必须使用 multi-formatter-parser 插件。 所以我将 FluentD 配置如下(见@nginx
过滤器的expression
):
<source>
@type tail
@id in_tail_container_logs
@label @containers
path /var/log/containers/*.log
exclude_path ["/var/log/containers/cloudwatch-agent*", "/var/log/containers/fluentd*", "/var/log/containers/nginx*"]
pos_file /var/log/fluentd-containers.log.pos
tag *
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<source>
@type tail
@id in_tail_nginx_container_logs
@label @nginx
path /var/log/containers/nginx*.log
pos_file /var/log/fluentd-nginx.log.pos
tag *
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<source>
@type tail
@id in_tail_cwagent_logs
@label @cwagentlogs
path /var/log/containers/cloudwatch-agent*
pos_file /var/log/cloudwatch-agent.log.pos
tag *
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<label @containers>
<filter **>
@type parser
key_name log
format json
reserve_data true
</filter>
<filter **>
@type kubernetes_metadata
@id filter_kube_metadata
</filter>
<filter **>
@type record_transformer
@id filter_containers_stream_transformer
<record>
stream_name ${tag_parts[3]}
</record>
</filter>
<filter **>
@type concat
key log
multiline_start_regexp /^\S/
separator ""
flush_interval 5
timeout_label @NORMAL
</filter>
<match **>
@type relabel
@label @NORMAL
</match>
</label>
<label @nginx>
<filter **>
@type kubernetes_metadata
@id filter_nginx_kube_metadata
</filter>
<filter **>
@type record_transformer
@id filter_nginx_containers_stream_transformer
<record>
stream_name ${tag_parts[3]}
</record>
</filter>
<filter **>
@type parser
key_name log
<parse>
@type multi_format
<pattern>
format regexp
expression /^(?<host>[^ ]*) (?<domain>[^ ]*) \[(?<x_forwarded_for>[^\]]*)\] (?<server_port>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+[^\"])(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")? (?<request_length>[^ ]*) (?<request_time>[^ ]*) (?:\[(?<proxy_upstream_name>[^\]]*)\] )?(?:\[(?<proxy_alternative_upstream_name>[^\]]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<request_id>[^ ]*)\n$/
</pattern>
</parse>
</filter>
<match **>
@type relabel
@label @NORMAL
</match>
</label>
<label @cwagentlogs>
<filter **>
@type kubernetes_metadata
@id filter_kube_metadata_cwagent
</filter>
<filter **>
@type record_transformer
@id filter_cwagent_stream_transformer
<record>
stream_name ${tag_parts[3]}
</record>
</filter>
<filter **>
@type concat
key log
multiline_start_regexp /^\d{4}[-/]\d{1,2}[-/]\d{1,2}/
separator ""
flush_interval 5
timeout_label @NORMAL
</filter>
<match **>
@type relabel
@label @NORMAL
</match>
</label>
<label @NORMAL>
<match **>
@type cloudwatch_logs
@id out_cloudwatch_logs_containers
region "#{ENV.fetch('REGION')}"
log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/application"
log_stream_name_key stream_name
remove_log_stream_name_key true
auto_create_stream true
<buffer>
flush_interval 5
chunk_limit_size 2m
queued_chunks_limit_size 32
retry_forever true
</buffer>
</match>
</label>
现在我看到以下日志的解析器错误:
...#0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '10.0.1.2 - - [25/Aug/2020:11:43:09 +0000] \"GET /favicon.ico HTTP/1.1\" 499 0 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/ Firefox/79.0\" 901 0.000 [develop-api-8080] [] 10.0.2.3:8080 0 0.000 - 3a3d3bbd02a633aaaab2af3b5284a0c9\n'"
..."log"=>"10.0.1.2 - - [25/Aug/2020:11:43:09 +0000] \"GET /favicon.ico HTTP/1.1\" 499 0 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0\" 901 0.000 [develop-api-8080] [] 10.0.2.3:8080 0 0.000 - 3a3d3bbd02a633aaaab2af3b5284a0c9\n"
我不确定问题是出在我的正则表达式还是配置的其他部分。 (请注意,我还没有为 Nginx 应用程序日志添加解析器!)。 谢谢。
本身不是答案,因为我认为正则表达式不太正确。 但是由于我可以访问 Ngnix,我只是将日志格式更改为 JSON,而不是使用 Regex 解析它:
'log-format-upstream': '{ "app": "nginx", "time":"$time_iso8601", "remote_addr":"$remote_addr", "remote_user":"$remote_user", "forwarded_for":"$http_x_forwarded_for", "host":"$host", "res_status":"$status", "res_body_size":"$body_bytes_sent", "res_size":"$bytes_sent", "req_id":"$req_id", "req_uri":"$uri", "req_time":"$request_time", "req_proto":"$server_protocol", "req_query":"$query_string", "req_length":"$request_length", "req_method":"$request_method", "agent":"$http_user_agent", "up_name": "$proxy_upstream_name", "up_addr": "$upstream_addr", "up_res_status": "$upstream_status", "up_res_time": "$upstream_response_time", "up_res_length": "$upstream_response_length" }'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.