简体   繁体   中英

Logstash grok filter is slower than Java regex pattern matching

I have some syslog messages in RFC-5424 format.

I am using both logstash grok filter and java regex pattern matching to parse the logs. I am comparing both the approaches for the same input.

java regex parsing is reading input from stdin and running in single thread.

Logstash is also reading input from stdin and I have configured worker threads to one and pipeline batch size is equal to number of messages so that all messages are processed in one batch.

I can see that java regular expression parsing is much faster than grok filter.

Observations:

Input - 300K messages

Java Regex - 1500 millis

Logstash Grok - more than 1 minute every time.

Why is Logstash Grok is this much slower than java regex. Grok is also supposed to be using java regexes in backend.

Without any more details, it is hard to tell why a specific pattern is slower or faster in Grok or Java, but one thing is certain: the regex engines are different .

Java uses a special java.util.regex module and Grok uses Oniguruma regex engine. They may handle the same patterns and strings their own way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM