简体   繁体   English

使用Logstash Ruby过滤器解析csv文件

[英]Using Logstash Ruby filter to parse csv file

I have an elasticsearch index which I am using to index a set of documents. 我有一个用来搜索一组文档的elasticsearch索引。

These documents are originally in csv format and I am looking parse these using logstash. 这些文档最初是csv格式的,我希望使用logstash进行解析。

My problem is that I have something along the following lines. 我的问题是我有以下几方面的事情。

field1,field2,field3,xyz,abc 字段1,字段2,字段3,XYZ,ABC

field3 is something like 123456789 and I want to parse it as 4.56(789) using ruby code filter. field3类似于123456789,我想使用ruby代码过滤器将其解析为4.56(789)。

My try: 我的尝试:

I tried with stdin and stdout with the following logstash.conf . 我尝试使用以下logstash.conf来使用stdin和stdout。

input {
        stdin {
        }
}

filter {
        ruby {
                code => "
                  b = event["message"]
                  string2=""
                  for counter in (3..(num.size-1))
                         if counter == 4
                                string2+= '_'+ num[counter]
                         elsif counter ==  6
                                string2+= '('+num[counter]
                         elsif counter == 8
                                string2+= num[counter]  +')'
                         else
                                string2+= num[counter]
                         end

                  end

                 event["randomcheck"] = string2

                "
        }
}


output {
        stdout {
                codec=>rubydebug
        }
}

I am getting syntax error using this. 我正在使用此语法错误。

My final aim is to use this with my csv file , but first I was trying this with stdin and stdout. 我的最终目标是将其与csv文件一起使用,但是首先我尝试使用stdin和stdout进行此操作。

Any help will be highly appreciated. 任何帮助将不胜感激。

The reason you're getting a syntax error is most likely because you have unescaped double quotes inside the double quoted string. 出现语法错误的原因很可能是因为您在双引号引起来的字符串中未对双引号进行了转义。 Either make the string single quoted or keep it double quoted but use single quotes inside. 可以将字符串用单引号引起来,也可以将其保持双引号,但在内部使用单引号。 I also don't understand how that code is supposed to work. 我也不明白该代码应该如何工作。

But that aside, why use a ruby filter in the first place? 除此之外,为什么首先要使用红宝石过滤器? You can use a csv filter for the CSV parsing and a couple of standard filters to transform 123456789 to 4.56(789). 您可以使用csv过滤器进行CSV解析,并可以使用几个标准过滤器将123456789转换为4.56(789)。

filter {
  # Parse the CSV fields and then delete the 'message' field.
  csv {
    remove_field => ["message"]
  }
  # Given an input such as 123456789, extract 4, 56, and 789 into
  # their own fields.
  grok {
    match => [
      "column3",
      "\d{3}(?<intpart>\d)(?<fractionpart>\d{2})(?<parenpart>\d{3})"
    ]
  }
  # Put the extracted fields together into a single field again,
  # then delete the temporary fields.
  mutate {
    replace => ["column3", "%{intpart}.%{fractionpart}(%{parenpart})"]
    remove_field => ["intpart", "factionpart", "parenpart"]
  }
}

The temporary fields have really bad names in the example above since I don't know what they represent. 在上面的示例中,临时字段的名称实际上是错误的,因为我不知道它们代表什么。 Also, depending on what the input can look like you may have to adjust the grok expression. 另外,根据输入的外观,您可能需要调整grok表达式。 As it stands now it assumes nine-digit input. 目前,它假设输入的是九位数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM