简体   繁体   中英

Facing Issue while sending data from Filebeats to Multiple Logstash files

To be Precise, I am handling a log file which has almost millions of records. Since it is a Billing Summary log, Customer Information will be recorded in no particular order.
I am Using customized GROK Patterns and logstash XML filter plugin to extract the data which would be sufficient to track. To track the The Individual Customer Activities, I am using "Customer_ID" as a unique key. So Even though I am using Multiple Logstash Files, and Multiple GROK Patterns, All his Information could be bounded/Aggregated using his "Customer_ID" (Unique Key)

here is my sample of log file,
7-04-2017 08:49:41 INFO abcinfo (ABC_RemoteONUS_Processor.java52) - Customer_Entry :::<?xml version="1.0" encoding="UTF-8"?><ns2:ReqListAccount xmlns:ns2="http://vcb.org/abc/schema/"/"><Head msgId="1ABCDEFegAQtQOSuJTEs3u" orgId="ABC" ts="2017-04-27T08:49:51+05:30" ver="1.0"/><Cust id="ABCDVFR233cd662a74a229002159220ce762c" note="Account CUST Listing" refId="DCVD849512576821682" refUrl="http://www.ABC.org.in/" ts="2017-04-27T08:49:51+05:30"

My Grok Pattern,

grok {
patterns_dir => "D:\elk\logstash-5.2.1\vendor\bundle\jruby\1.9\gems\logstash-patterns-core-4.0.2\patterns"
match => [ "message" , "%{DATESTAMP:datestamp} %{LOGLEVEL:Logseverity}\s+%{WORD:ModuleInfo} \(%{NOTSPACE:JavaClass}\)%{ABC:Customer_Init}%{GREEDYDATA:Cust}"]add_field => { "Details" => "Request" }remove_tag => ["_grokparsefailure"]}  

My Customized pattern which is stored inside Pattern_dir,

ABC ( - Customer_Entry :::)

My XML Filter plugin,

xml {
source => "Cust"
store_xml =>false
xpath => [
  "//Head/@ts", "Cust_Req_time",
  "//Cust/@id", "Customer_ID",
  "//Cust/@note", "Cust_note", ]
  }  

So whatever the details comes behind ** - Customer_Entry :::**, I will be able to extract it using XML Plugin Filter (will be stored similar to multi-line codec). I have written 5 different Logstash files to extract different Activities of Customer with 5 different Grok Patterns. Which will tell,

1.Customer_Entry
2.Customer_Purchase
3.Customer_Last_Purchase
4.Customer_Transaction
5.Customer_Authorization

All the above Grok patterns has different set of Information, which will be grouped by Customer_ID as I said earlier.

I can able to Extract the Information and Visualize It clearly in Kibana without any flaw by using my Customized pattern with different log files.

Since I have 100's of Log files each and everyday to put into logstash, I opted for Filebeats, but Filebeats run with only one port "5044". I tried to run with 5 different ports for 5 different logstash files but that was not working, Only one logstash file of 5 was getting loaded rest of the config files were being Idle.
here is my sample filebeat output.prospector ,

output.logstash:
hosts: ["localhost:5044"]

output.logstash:
hosts: ["localhost:5045"]

output.logstash:
hosts: ["localhost:5046"]

I couldn't add all the grok patterns in one logstash config file, because XML Filter plugin takes the source "GREEDYDATA" . in such case I will be having 5 different Source=> for 5 different Grok pattern. I even tried that too but that was not working.

Looking for better approach.

Sounds like you're looking for scale, with parallel ingestion. As it happens, File beats supports something called load-balancing which sounds like what you're looking for.

output.logstash:
  hosts: [ "localhost:5044", "localhost:5045", "localhost:5046" ]
  loadbalance: true

That's for the outputs. Though, I believe you wanted multithreading on the input. FileBeats s supposed to track all files specified in the prospector config, but you've found limits. Globbing or specifying a directory will single-thread the files in that glob/directory. If your file-names support it, creative-globbing may get you better parallelism by defining multiple globs in the same directory.

Assuming your logs are coming in by type:

- input_type: log
  paths:
    - /mnt/billing/*entry.log
    - /mnt/billing/*purchase.log
    - /mnt/billing/*transaction.log

Would enable prospectors on multiple threads reading in parallel files here.

If your logs were coming in with random names, you could use a similar setup

- input_type: log
  paths:
    - /mnt/billing/a*
    - /mnt/billing/b*
    - /mnt/billing/c*
    [...]
    - /mnt/billing/z*

If you are processing lots of files with unique names that never repeat, adding the clean_inactive config config-option to your prospectors will keep your FileBeat running fast.

- input_type: log
  ignore_older: 18h
  clean_inactive: 24h
  paths:
    - /mnt/billing/a*
    - /mnt/billing/b*
    - /mnt/billing/c*
    [...]
    - /mnt/billing/z*

Which will remove all state for files older than 24 hours old, and won't bother processing any file more than 18 hours old.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM