I have a Nginx Ingress Controller in the K8S cluster that has the following log format (I took it from /etc/nginx/nginx.conf
in the container):
log_format upstreaminfo '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id';
My objective is to parse and push Nginx logs to CW. Note that Nginx log files contain both Nginx application logs (eg info and warn logs) as well as access logs. My understanding is I have to use multi-formatter-parser plugin. So I configured FluentD as follow (see expression
of @nginx
filter):
<source>
@type tail
@id in_tail_container_logs
@label @containers
path /var/log/containers/*.log
exclude_path ["/var/log/containers/cloudwatch-agent*", "/var/log/containers/fluentd*", "/var/log/containers/nginx*"]
pos_file /var/log/fluentd-containers.log.pos
tag *
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<source>
@type tail
@id in_tail_nginx_container_logs
@label @nginx
path /var/log/containers/nginx*.log
pos_file /var/log/fluentd-nginx.log.pos
tag *
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<source>
@type tail
@id in_tail_cwagent_logs
@label @cwagentlogs
path /var/log/containers/cloudwatch-agent*
pos_file /var/log/cloudwatch-agent.log.pos
tag *
read_from_head true
<parse>
@type json
time_format %Y-%m-%dT%H:%M:%S.%NZ
</parse>
</source>
<label @containers>
<filter **>
@type parser
key_name log
format json
reserve_data true
</filter>
<filter **>
@type kubernetes_metadata
@id filter_kube_metadata
</filter>
<filter **>
@type record_transformer
@id filter_containers_stream_transformer
<record>
stream_name ${tag_parts[3]}
</record>
</filter>
<filter **>
@type concat
key log
multiline_start_regexp /^\S/
separator ""
flush_interval 5
timeout_label @NORMAL
</filter>
<match **>
@type relabel
@label @NORMAL
</match>
</label>
<label @nginx>
<filter **>
@type kubernetes_metadata
@id filter_nginx_kube_metadata
</filter>
<filter **>
@type record_transformer
@id filter_nginx_containers_stream_transformer
<record>
stream_name ${tag_parts[3]}
</record>
</filter>
<filter **>
@type parser
key_name log
<parse>
@type multi_format
<pattern>
format regexp
expression /^(?<host>[^ ]*) (?<domain>[^ ]*) \[(?<x_forwarded_for>[^\]]*)\] (?<server_port>[^ ]*) - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+[^\"])(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")? (?<request_length>[^ ]*) (?<request_time>[^ ]*) (?:\[(?<proxy_upstream_name>[^\]]*)\] )?(?:\[(?<proxy_alternative_upstream_name>[^\]]*)\] )?(?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<request_id>[^ ]*)\n$/
</pattern>
</parse>
</filter>
<match **>
@type relabel
@label @NORMAL
</match>
</label>
<label @cwagentlogs>
<filter **>
@type kubernetes_metadata
@id filter_kube_metadata_cwagent
</filter>
<filter **>
@type record_transformer
@id filter_cwagent_stream_transformer
<record>
stream_name ${tag_parts[3]}
</record>
</filter>
<filter **>
@type concat
key log
multiline_start_regexp /^\d{4}[-/]\d{1,2}[-/]\d{1,2}/
separator ""
flush_interval 5
timeout_label @NORMAL
</filter>
<match **>
@type relabel
@label @NORMAL
</match>
</label>
<label @NORMAL>
<match **>
@type cloudwatch_logs
@id out_cloudwatch_logs_containers
region "#{ENV.fetch('REGION')}"
log_group_name "/aws/containerinsights/#{ENV.fetch('CLUSTER_NAME')}/application"
log_stream_name_key stream_name
remove_log_stream_name_key true
auto_create_stream true
<buffer>
flush_interval 5
chunk_limit_size 2m
queued_chunks_limit_size 32
retry_forever true
</buffer>
</match>
</label>
Now I see a parser error for the following log:
...#0 dump an error event: error_class=Fluent::Plugin::Parser::ParserError error_class=Fluent::Plugin::Parser::ParserError error="pattern not matched with data '10.0.1.2 - - [25/Aug/2020:11:43:09 +0000] \"GET /favicon.ico HTTP/1.1\" 499 0 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/ Firefox/79.0\" 901 0.000 [develop-api-8080] [] 10.0.2.3:8080 0 0.000 - 3a3d3bbd02a633aaaab2af3b5284a0c9\n'"
..."log"=>"10.0.1.2 - - [25/Aug/2020:11:43:09 +0000] \"GET /favicon.ico HTTP/1.1\" 499 0 \"-\" \"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:79.0) Gecko/20100101 Firefox/79.0\" 901 0.000 [develop-api-8080] [] 10.0.2.3:8080 0 0.000 - 3a3d3bbd02a633aaaab2af3b5284a0c9\n"
I'm not sure if the issue is with my regex or some other part of the configuration. (Note that I haven't added the parser for Nginx application logs yet!). Thanks.
Not an answer per se, as I thought the regex is not quite right. But since I've got access to Ngnix, I simply changed the log format to be JSON instead of parsing it using Regex:
'log-format-upstream': '{ "app": "nginx", "time":"$time_iso8601", "remote_addr":"$remote_addr", "remote_user":"$remote_user", "forwarded_for":"$http_x_forwarded_for", "host":"$host", "res_status":"$status", "res_body_size":"$body_bytes_sent", "res_size":"$bytes_sent", "req_id":"$req_id", "req_uri":"$uri", "req_time":"$request_time", "req_proto":"$server_protocol", "req_query":"$query_string", "req_length":"$request_length", "req_method":"$request_method", "agent":"$http_user_agent", "up_name": "$proxy_upstream_name", "up_addr": "$upstream_addr", "up_res_status": "$upstream_status", "up_res_time": "$upstream_response_time", "up_res_length": "$upstream_response_length" }'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.