简体   繁体   中英

ElasticSearch throws string to date Conversion error when importing csv with Logstash

I'm trying to run a simple csv file with Logstash to ElasticSearch.

But when I run it, I get the following error for converting the string to a date format (the first Date column).

"error"=>{
"type"=>"mapper_parsing_exception", 
"reason"=>"failed to parse [Date]",
"caused_by"=>{
    "type"=>"illegal_argument_exception", 
    "reason"=>"Invalid format: \"Date\""}}}}

When I remove the Date column, all works great.

I'm using the following csv file:

Date,Open,High,Low,Close,Volume,Adj Close
2015-04-02,125.03,125.56,124.19,125.32,32120700,125.32
2015-04-01,124.82,125.12,123.10,124.25,40359200,124.25
2015-03-31,126.09,126.49,124.36,124.43,41852400,124.43
2015-03-30,124.05,126.40,124.00,126.37,46906700,126.37

and the following logstash.conf:

input {
  file {
    path => "path/file.csv"
    type => "core2"
    start_position => "beginning"    
  }
}
filter {
  csv {
      separator => ","
      columns => ["Date","Open","High","Low","Close","Volume","Adj Close"]
  }
  mutate {convert => ["High", "float"]}
  mutate {convert => ["Open", "float"]}
  mutate {convert => ["Low", "float"]}
  mutate {convert => ["Close", "float"]}
  mutate {convert => ["Volume", "float"]}
  date {
    match => ["Date", "yyyy-MM-dd"]
    target => "Date"
  }
}
output {  
    elasticsearch {
        action => "index"
        hosts => "localhost"
        index => "stock15"
        workers => 1
    }
    stdout {}
}

Seems I'm handling the Date fine. Any idea what could have gone wrong?

Thanks!

The problem is in the file itself. Logstash is reading the first line and it is unable to parse :

Date,Open,High,Low,Close,Volume,Adj Close

Right not the solution it to remove the headers of the file :

2015-04-02,125.03,125.56,124.19,125.32,32120700,125.32
2015-04-01,124.82,125.12,123.10,124.25,40359200,124.25
2015-03-31,126.09,126.49,124.36,124.43,41852400,124.43
2015-03-30,124.05,126.40,124.00,126.37,46906700,126.37

And it should be okay.

There is an issue about this at GitHub : https://github.com/elastic/logstash/issues/2088

Thanks @Yeikel, I ended up changing the logstash config and not the data itself.

Before applying the csv filter, I examine with regex to see wether it is the header. So if it is the header I drop it, and continue to the next line (that will be handled with the csv filter)

Please see the updated config that solves the header issue:

input {  
  file {
    path => "path/file.csv"
    start_position => "beginning"    
  }
}
filter {  
    if ([message] =~ "\bDate\b") {
        drop { }
    } else {
        csv {
            separator => ","
            columns => ["Date","Open","High","Low","Close","Volume","Adj Close"]
        }
        mutate {convert => ["High", "float"]}
        mutate {convert => ["Open", "float"]}
        mutate {convert => ["Low", "float"]}
        mutate {convert => ["Close", "float"]}
        mutate {convert => ["Volume", "float"]}
      date {
        match => ["Date", "yyyy-MM-dd"]
      }
    }
}
output {  
    elasticsearch {
        action => "index"
        hosts => "localhost"
        index => "stock15"
        workers => 1
    }
    stdout {
        codec => rubydebug
     }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM