I'm trying to request database with logstash jdbc plugins and returns a csv output file with headers with logstash csv plugin .
I spent a lot of time on logstash documentation but I'm still missing a point.
With the following logstash configuration, the results give me a file with headers for each row. I couldn't find a way to add the headers for only the first row in the logstash configuration.
Helps very much appreciated.
_object$id;_object$name;_object$type;nb_surveys;csat_score
2;Jeff Karas;Agent;2;2
_object$id;_object$name;_object$type;nb_surveys;csat_score
3;John Lafer;Agent;2;2;2;2;$2;2
_object$id;_object$name;_object$type;nb_surveys;csat_score
4;Michele Fisher;Agent;2;2
_object$id;_object$name;_object$type;nb_surveys;csat_score
5;Chad Hendren;Agent;2;78
input {
jdbc {
jdbc_connection_string => "jdbc:postgresql://localhost:5432/postgres"
jdbc_user => "postgres"
jdbc_password => "postgres"
jdbc_driver_library => "/tmp/drivers/postgresql/postgresql_jdbc.jar"
jdbc_driver_class => "org.postgresql.Driver"
statement_filepath => "query.sql"
}
}
output {
csv {
fields => ["_object$id","_object$name","_object$type","nb_surveys","csat_score"]
path => "output/%{team}/output-%{team}.%{+yyyy.MM.dd}.csv"
csv_options => {
"write_headers" => true
"headers" =>["_object$id","_object$name","_object$type","nb_surveys","csat_score"]
"col_sep" => ";"
}
}
}
Thanks
The reason why you are getting multiple headers in the output is because Logstash has no concept of global/shared state between events, each item is handled in isolation so every time the CSV output plugin runs it behaves like the first one and writes the headers.
I had the same issue and found a solution using the init option of the ruby filter to execute some code at logstash startup-time.
Here is an example logstash config:
# csv-headers.conf
input {
stdin {}
}
filter {
ruby {
init => "
begin
@@csv_file = 'output.csv'
@@csv_headers = ['A','B','C']
if File.zero?(@@csv_file) || !File.exist?(@@csv_file)
CSV.open(@@csv_file, 'w') do |csv|
csv << @@csv_headers
end
end
end
"
code => "
begin
event['@metadata']['csv_file'] = @@csv_file
event['@metadata']['csv_headers'] = @@csv_headers
end
"
}
csv {
columns => ["a", "b", "c"]
}
}
output {
csv {
fields => ["a", "b", "c"]
path => "%{[@metadata][csv_file]}"
}
stdout {
codec => rubydebug {
metadata => true
}
}
}
If you run Logstash with that config:
echo "1,2,3\n4,5,6\n7,8,9" | ./bin/logstash -f csv-headers.conf
You will get an output.csv
file with this content:
A,B,C
1,2,3
4,5,6
7,8,9
This is also thread-safe because it runs the code on startup only, so you can use multiple workers.
Hope it helps!
I am using dynamic file names that leverage the date of the event (index-YYYY-MM-DD.csv) so writing the headers on pipeline start was not a viable option for me.
Instead, I allowed the duplicate headers to be written and set up a cron job to run every few minutes and remove all duplicate rows and write the result back into the same file.
#!/bin/bash -xe
for filename in /tmp/logstash/*.csv; do awk '!v[$1]++' $filename > $filename.tmp && mv -f $filename.tmp $filename; done
NOTE: This is only tested on an instance where I am pulling a couple hundred MB of data - this may not be a viable option if your data pipeline is ingesting GB per minute.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.