如何使用Logstash Grok拆分文件名？

Question

One of these days I'll learn regex. 有一天我会学习正则表达式。

I have the following filename 我有以下文件名

PE-run1000hbgmm3f1-job1000hbgmm3dt-Output-Workflow-1000hbgmm3fb-22.07.17.log

I'm able to get this to work so... 我能够让这个工作如此......

(?<logtype>[^-]+)-(?<run_id>[^-]+)-(?<job_id>[^-]+)-(?<capability>[^(0-9\.0-9\.0-9)]+)

logtype: PE
run_id: run1000hbgmm3f1
job_id: job1000hbgmm3dt

But I'm getting 但我得到了

capability: Output-Workflow-

...though I want it to be ......虽然我想要它

capability: Output-Workflow-1000hbgmm3fb

...that is, all the text after the job_id up to the timestamp HH.mm.ss. ...即job_id之后的所有文本直到时间戳HH.mm.ss. Any help please? 有什么帮助吗？ Thanks! 谢谢！

Answer 1

It is because you cannot negate a sequence of symbols with a negated character class. 这是因为你不能否定一个带有否定字符类的符号序列。 [^(0-9\\.0-9\\.0-9)] matches any single char other than ( , digit, . and ) . [^(0-9\\.0-9\\.0-9)]匹配以外的任何单个字符( ，数字.和) 。

You may replace your (?<capability>[^(0-9\\.0-9\\.0-9)]+) with (?<capability>.*?)-\\d{2}\\.\\d{2}\\.\\d{2} to get the right value. 您可以将(?<capability>[^(0-9\\.0-9\\.0-9)]+)替换为(?<capability>.*?)-\\d{2}\\.\\d{2}\\.\\d{2}以获得正确的值。

Now, the (?<capability>.*?)-\\d{2}\\.\\d{2}\\.\\d{2} will match any 0+ chars (and capture them into "capability" group) as few as possible (since the *? is a lazy quantifier) up to the first occurrence of - , followed with 2 digits, and then 3 sequences of a dot ( \\. ) followed with 2 digits. 现在， (?<capability>.*?)-\\d{2}\\.\\d{2}\\.\\d{2}将匹配任何0+字符（并将它们捕获到“功能”组）中尽可能（因为*?是一个惰性量词），直到第一次出现- ，然后是2位数，然后是3个点的序列（ \\. ），后跟2位数。

See the regex demo at regex101.com. 请参阅regex101.com上的正则表达式演示 。

如何使用Logstash Grok拆分文件名？

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-04-17 16:45:53

如何使用Logstash Grok拆分文件名？

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-04-17 16:45:53

解决方案1
0 已采纳 2017-04-17 16:45:53