简体   繁体   中英

Parsing XML data from Filebeat using Logstash

I am using Filebeat to parse XML files in Windows, and sending them to Logstash for filtering and sending to Elasticsearch.

The Filebeat job worked perfectly and I m getting XML blocks into Logstash, but it looks likes I misconfigured Logstash filter to parse XML blocks into separated fields and encapsulating these fields into an Elasticsearch type.

Here is my XML sample data:

 <H_Ticket> <IDH_Ticket>26</IDH_Ticket> <CodeBus>186</CodeBus> <CodeCh>5531</CodeCh> <CodeConv>5531</CodeConv> <Codeligne>12</Codeligne> <Date>20150915</Date> <Heur>1110</Heur> <NomFR1>SOUK AHAD</NomFR1> <NomFR2>KANTAOUI </NomFR2> <Prix>0.66</Prix> <IDTicket>26</IDTicket> <CodeRoute>107</CodeRoute> <origine>01</origine> <Distination>06</Distination> <Num>6</Num> <Ligne>107</Ligne> <requisition> </requisition> <voyage>0</voyage> <faveur> </faveur> </H_Ticket> <H_Ticket> <IDH_Ticket>26</IDH_Ticket> <CodeBus>186</CodeBus> <CodeCh>5531</CodeCh> <CodeConv>5531</CodeConv> <Codeligne>12</Codeligne> <Date>20150915</Date> <Heur>1110</Heur> <NomFR1>SOUK AHAD</NomFR1> <NomFR2>KANTAOUI </NomFR2> <Prix>0.66</Prix> <IDTicket>26</IDTicket> <CodeRoute>107</CodeRoute> <origine>01</origine> <Distination>06</Distination> <Num>6</Num> <Ligne>107</Ligne> <requisition> </requisition> <voyage>0</voyage> <faveur> </faveur> </H_Ticket>> <H_Ticket> <IDH_Ticket>26</IDH_Ticket> <CodeBus>186</CodeBus> <CodeCh>5531</CodeCh> <CodeConv>5531</CodeConv> <Codeligne>12</Codeligne> <Date>20150915</Date> <Heur>1110</Heur> <NomFR1>SOUK AHAD</NomFR1> <NomFR2>KANTAOUI </NomFR2> <Prix>0.66</Prix> <IDTicket>26</IDTicket> <CodeRoute>107</CodeRoute> <origine>01</origine> <Distination>06</Distination> <Num>6</Num> <Ligne>107</Ligne> <requisition> </requisition> <voyage>0</voyage> <faveur> </faveur> </H_Ticket> 

And here is my logstash config file:

input {  
    beats {
    port => 5044
  }
}
filter 
{
    xml 
    {
        source => "ticket"
        xpath => 
        [
            "/ticket/IDH_Ticket/text()", "ticketId",
            "/ticket/CodeBus/text()", "codeBus",
            "/ticket/CodeCh/text()", "codeCh",
            "/ticket/CodeConv/text()", "codeConv",
            "/ticket/Codeligne/text()", "codeLigne",
            "/ticket/Date/text()", "date",
            "/ticket/Heur/text()", "heure",
            "/ticket/NomFR1/text()", "nomFR1",
            "/ticket/NomAR1/text()", "nomAR1",
            "/ticket/NomFR2/text()", "nomFR2",
            "/ticket/NomAR2/text()", "nomAR2",
            "/ticket/Prix/text()", "prix",
            "/ticket/IDTicket/text()", "idTicket",
            "/ticket/CodeRoute/text()", "codeRoute",
            "/ticket/origine/text()", "origine",
            "/ticket/Distination/text()", "destination",
            "/ticket/Num/text()", "num",
            "/ticket/Ligne/text()", "ligne",
            "/ticket/requisition/text()", "requisition",
            "/ticket/voyage/text()", "voyage",
            "/ticket/faveur/text()", "faveur"
        ]
        store_xml => true
        target => "doc"
    }
}

output 
{
    elasticsearch 
    { 
        hosts => "localhost"
        index => "buses"
        document_type => "ticket"
    }
    file {
    path => "C:\busesdata\logstash.log"
}
stdout { codec =>rubydebug}
}

Filebeat configuration:

filebeat:
  # List of prospectors to fetch data.
  prospectors:
      paths:
        - C:\busesdata\*.xml
      input_type: log
      document_type: ticket
      scan_frequency: 10s
      multiline:
        pattern: '<H_Ticket'
        negate: true
        match: after
output:
  ### Logstash as output
  logstash:
    hosts: ["localhost:5044"]
    index: filebeat

And here is a portion of both stdout and file output:

PS C:\logstash-2.3.3\bin> .\logstash -f .\logstash_temp.conf
io/console not supported; tty will not be manipulated
Settings: Default pipeline workers: 4
Pipeline main started

{
       "message" => "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\r\n<?xml-stylesheet href=\"ticket.xsl\" type=\"text/xsl\"?>\n<HF_DOCUMENT>",
      "@version" => "1",
    "@timestamp" => "2016-07-03T12:13:28.892Z",
        "source" => "C:\\busesdata\\ticket2.xml",
          "type" => "ticket",
    "input_type" => "log",
        "fields" => nil,
          "beat" => {
        "hostname" => "hp-pavillion-g6",
            "name" => "hp-pavillion-g6"
    },
        "offset" => 0,
         "count" => 1,
          "host" => "hp-pavillion-g6",
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ]
}
{
       "message" => "\t<H_Ticket>\r\n\t\t<IDH_Ticket>1</IDH_Ticket>\r\n\t\t<CodeBus>186</CodeBus>\r\n\t\t<CodeCh>5531</CodeCh>\r\n\t\t<CodeConv>5531</CodeConv>\r\n\t\t<Codeligne>12</Codeligne>\r\n\t\t<Date>20150903</Date>\r\n\t\t<Heur>1101</Heur>\r\n\t\t<NomFR1>SOUK AHAD</NomFR1>\r\n\t\t<NomAR1>??? ?????</NomAR1>\r\n\t\t<NomFR2>SOVIVA </NomFR2>\r\n\t\t<NomAR2>??????</NomAR2>\r\n\t\t<Prix>0.66</Prix>\r\n\t\t<IDTicket>1</IDTicket>\r\n\t\t<CodeRoute>107</CodeRoute>\r\n\t\t<origine>01</origine>\r\n\t\t<Distination>07</Distination>\r\n\t\t<Num>3</Num>\r\n\t\t<Ligne>107</Ligne>\r\n\t\t<requisition> </requisition>\r\n\t\t<voyage>0</voyage>\r\n\t\t<faveur> </faveur>\r\n\t</H_Ticket>",
      "@version" => "1",
    "@timestamp" => "2016-07-03T12:13:28.892Z",
    "input_type" => "log",
        "source" => "C:\\busesdata\\ticket2.xml",
        "offset" => 125,
          "type" => "ticket",
         "count" => 1,
        "fields" => nil,
          "beat" => {
        "hostname" => "hp-pavillion-g6",
            "name" => "hp-pavillion-g6"
    },
          "host" => "hp-pavillion-g6",
          "tags" => [
        [0] "beats_input_codec_plain_applied"
    ]
}

Can you try editing the xpath configuration in the filter as below:

filter 
{
    xml 
    {
        source => "ticket"
        xpath => 
        [
            "/IDH_Ticket/text()", "ticketId",
            "/CodeBus/text()", "codeBus",
            "/CodeCh/text()", "codeCh",
            "/CodeConv/text()", "codeConv",
            "/Codeligne/text()", "codeLigne",
            "/Date/text()", "date",
            "/Heur/text()", "heure",
            "/NomFR1/text()", "nomFR1",
            "/NomAR1/text()", "nomAR1",
            "/NomFR2/text()", "nomFR2",
            "/NomAR2/text()", "nomAR2",
            "/Prix/text()", "prix",
            "/IDTicket/text()", "idTicket",
            "/CodeRoute/text()", "codeRoute",
            "/origine/text()", "origine",
            "/Distination/text()", "destination",
            "/Num/text()", "num",
            "/Ligne/text()", "ligne",
            "/requisition/text()", "requisition",
            "/voyage/text()", "voyage",
            "/faveur/text()", "faveur"
        ]
        store_xml => true
        target => "doc"
    }
}

The xml filter won't work since the source configuration points to a field that does not exist.
There are no field ticket in your document:

{
    "message" => "\t<H_Ticket>\r\n\t\t<IDH_Ticket>1</IDH_Ticket>\r\n\t\t<CodeBus>186</CodeBus>\r\n\t\t<CodeCh>5531</CodeCh>\r\n\t\t<CodeConv>5531</CodeConv>\r\n\t\t<Codeligne>12</Codeligne>\r\n\t\t<Date>20150903</Date>\r\n\t\t<Heur>1101</Heur>\r\n\t\t<NomFR1>SOUK AHAD</NomFR1>\r\n\t\t<NomAR1>??? ?????</NomAR1>\r\n\t\t<NomFR2>SOVIVA </NomFR2>\r\n\t\t<NomAR2>??????</NomAR2>\r\n\t\t<Prix>0.66</Prix>\r\n\t\t<IDTicket>1</IDTicket>\r\n\t\t<CodeRoute>107</CodeRoute>\r\n\t\t<origine>01</origine>\r\n\t\t<Distination>07</Distination>\r\n\t\t<Num>3</Num>\r\n\t\t<Ligne>107</Ligne>\r\n\t\t<requisition> </requisition>\r\n\t\t<voyage>0</voyage>\r\n\t\t<faveur> </faveur>\r\n\t</H_Ticket>",
    "@version" => "1",
    "@timestamp" => "2016-07-03T12:13:28.892Z",
    "input_type" => "log",
    "source" => "C:\\busesdata\\ticket2.xml",
    "offset" => 125,
    "type" => "ticket",
    "count" => 1,
    "fields" => nil,
    "beat" => {
        "hostname" => "hp-pavillion-g6",
        "name" => "hp-pavillion-g6"
    },
    "host" => "hp-pavillion-g6",
    "tags" => [
        [0] "beats_input_codec_plain_applied"
    ]
}

You should change the xml filter to:

 xml {
        source => "message"
        ...
 }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM