简体   繁体   中英

Logstash split xml into array

Is it possible to convert xml into array of objects using logstash?

That'd be my sample document:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "Metadata" : "<root><Tags><TagTypeID>1</TagTypeID><TagValue>twitter</TagValue></Tags><Tags><TagTypeID>1</TagTypeID><TagValue>facebook</TagValue></Tags><Tags><TagTypeID>2</TagTypeID><TagValue>usa</TagValue></Tags><Tags><TagTypeID>3</TagTypeID><TagValue>smartphones</TagValue></Tags></root>"
}

Ideally, I'd like to output this:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "Metadata" : [
    {
      "TagTypeID" : "1",
      "TagValue" : "twitter"
    },
    {
      "TagTypeID" : "1",
      "TagValue" : "facebook"
    },
    {
      "TagTypeID" : "2",
      "TagValue" : "usa"
    },
    {
      "TagTypeID" : "3",
      "TagValue" : "smartphones"
    }
  ]
}

However I'm not able to achieve that. I tried using xml filter like that:

xml
{
    source => "Metadata"
    target => "Parsed"
}

However, it outputs this

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "@version" : "1",
  "@timestamp" : "2015-10-27T17:21:31.961Z",
  "Parsed" : {
    "Tags" : [
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["twitter"]
      },
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["facebook"]
      },
      {
        "TagTypeID" : ["2"],
        "TagValue" : ["usa"]
      },
      {
        "TagTypeID" : ["3"],
        "TagValue" : ["smartphones"]
      }
    ]
  }
}

I don't want my values to be stored as arrays (I know there's always going to be just one value there).

I know what fields are going to be brought back from my input, so I can map structure myself and this doesn't need to be dynamic (although that would be nice).

Allow splitting of lists / arrays into multiple events seemed to be useful, but it's poorly documented and I couldn't find information how to use this filter for my use-case.

Logstash, split event from an xml file in multiples documents keeping information from root tags is similar, but not exactly what I'd like to achieve.

Logstash: XML to JSON output from array to string this seems to be useful, however it hardcodes that first element of array must be outputed as single item (not part of array). It brings me back this:

{
  "Title" : "My blog title",
  "Body" : "My first post ever",
  "@version" : "1",
  "@timestamp" : "2015-10-27T17:21:31.961Z",
  "Parsed" : {
    "Tags" : [
      {
        "TagTypeID" : "1",
        "TagValue" : "twitter"
      },
      {
        "TagTypeID" : ["1"],
        "TagValue" : ["facebook"]
      },
      {
        "TagTypeID" : ["2"],
        "TagValue" : ["usa"]
      },
      {
        "TagTypeID" : ["3"],
        "TagValue" : ["smartphones"]
      }
    ]
  }
}
  1. Can this be done without having to create custom filters? (I've no experience in Ruby)
  2. Or am I missing something basic here?

Here is one approach using logstash's builtin ruby filter .

Filter section:

filter {
    xml {
        source => "Metadata"
        target => "Parsed"
    }

    ruby {  code => "
        event['Parsed']['Tags'].each do |x|
            x.each do |key, value|
                x[key] = value[0]
            end
        end"
    }
}

Output:

"Parsed":{
  "Tags":[
      {
      "TagTypeID":"1",
      "TagValue":"twitter"
      },
      {
      "TagTypeID":"1",
      "TagValue":"facebook"
      },
      {
      "TagTypeID":"2",
      "TagValue":"usa"
      },
      {
      "TagTypeID":"3",
      "TagValue":"smartphones"
      }
  ]
}

If I understand you correctly this is your desired result. You need to specify the xml field inside the ruby filter: event['Parsed']['Tags'] . Does it need to be more dynamic? Let me know if you need anything else.

Can this be done without having to create custom filters? (I've no experience in Ruby)

Well, yes and no. Yes, because this is not really a custom filter but a built-in solution. No, because I tend to say this can not be done without Ruby. I must admit that Ruby seems to be an unattractive solution. However, this is a flexible approach and 5 lines of code shouldn't hurt that much.

Most recent Logstash version (5.1.1 at this point) has updated XML filter, which has force_array option. It is enabled by default. Setting this to false will do exactly the same thing as ruby filter in accepted answer.

Taken from documentation:

force_contentedit

  • Value type is boolean
  • Default value is false

By default the filter will expand attributes differently from content inside of tags. This option allows you to force text content and attributes to always parse to a hash value.

https://www.elastic.co/guide/en/logstash/current/plugins-filters-xml.html#plugins-filters-xml-force_array

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM