简体   繁体   中英

Using Logstash Ruby filter to parse csv file

I have an elasticsearch index which I am using to index a set of documents.

These documents are originally in csv format and I am looking parse these using logstash.

My problem is that I have something along the following lines.

field1,field2,field3,xyz,abc

field3 is something like 123456789 and I want to parse it as 4.56(789) using ruby code filter.

My try:

I tried with stdin and stdout with the following logstash.conf .

input {
        stdin {
        }
}

filter {
        ruby {
                code => "
                  b = event["message"]
                  string2=""
                  for counter in (3..(num.size-1))
                         if counter == 4
                                string2+= '_'+ num[counter]
                         elsif counter ==  6
                                string2+= '('+num[counter]
                         elsif counter == 8
                                string2+= num[counter]  +')'
                         else
                                string2+= num[counter]
                         end

                  end

                 event["randomcheck"] = string2

                "
        }
}


output {
        stdout {
                codec=>rubydebug
        }
}

I am getting syntax error using this.

My final aim is to use this with my csv file , but first I was trying this with stdin and stdout.

Any help will be highly appreciated.

The reason you're getting a syntax error is most likely because you have unescaped double quotes inside the double quoted string. Either make the string single quoted or keep it double quoted but use single quotes inside. I also don't understand how that code is supposed to work.

But that aside, why use a ruby filter in the first place? You can use a csv filter for the CSV parsing and a couple of standard filters to transform 123456789 to 4.56(789).

filter {
  # Parse the CSV fields and then delete the 'message' field.
  csv {
    remove_field => ["message"]
  }
  # Given an input such as 123456789, extract 4, 56, and 789 into
  # their own fields.
  grok {
    match => [
      "column3",
      "\d{3}(?<intpart>\d)(?<fractionpart>\d{2})(?<parenpart>\d{3})"
    ]
  }
  # Put the extracted fields together into a single field again,
  # then delete the temporary fields.
  mutate {
    replace => ["column3", "%{intpart}.%{fractionpart}(%{parenpart})"]
    remove_field => ["intpart", "factionpart", "parenpart"]
  }
}

The temporary fields have really bad names in the example above since I don't know what they represent. Also, depending on what the input can look like you may have to adjust the grok expression. As it stands now it assumes nine-digit input.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM