简体   繁体   English

logstash 中的 grok 正则表达式来解析和提取字段

[英]grok regex in logstash to parse and extract field

I am trying to extract certain fields from a single message field.我正在尝试从单个消息字段中提取某些字段。 I am trying to achieve this by grok regex on the logstash so that i could view them in kibana.我试图通过在logstash上的grok正则表达式来实现这一点,以便我可以在kibana中查看它们。

My log events is as below: [2021-01-06 12:10:40] ApiLogger.INFO: API log data: {"endpoint":"/rest/thre_en/V1/temp-carts/13cEIQqUb6cUfxB/tryer-inform","http_method":"GET","payload":[],"user_id":0,"user_type":4,"http_response_code":200,"response":"{\"pay_methods\":[{\"code\":\"frane\",\"title\":\"R2 Partial redeem\"}],\"totals\":{\"grand_total\":0,\"base_grand_total\":0}}我的日志事件如下: [2021-01-06 12:10:40] ApiLogger.INFO: API log data: {"endpoint":"/rest/thre_en/V1/temp-carts/13cEIQqUb6cUfxB/tryer-inform","http_method":"GET","payload":[],"user_id":0,"user_type":4,"http_response_code":200,"response":"{\"pay_methods\":[{\"code\":\"frane\",\"title\":\"R2 Partial redeem\"}],\"totals\":{\"grand_total\":0,\"base_grand_total\":0}}

The entire log has more information into different key value store- Basically, I needed these information -整个日志在不同的键值存储中有更多信息-基本上,我需要这些信息-

  1. time stamp (i am able to get this)时间戳(我能得到这个)
  2. log level (I am able to get this) => on loglevel, i just want the info not the entire Api.INFO日志级别(我能够得到这个)=> 在日志级别,我只想要信息而不是整个 Api.INFO
  3. endpoint端点
  4. http-method http方法
  5. user_id用户身份
  6. user_type用户类型
  7. http_response_code http_response_code
  8. response回复

I am not able to get the information from 3-8... i tested it.我无法从 3-8 中获取信息……我对其进行了测试。 it is due to the semi colon(:) this is what i tried through grok debugger %{SYSLOG5424SD:logtime} %{JAVACLASS:loglevel}: (?<API>\w+ \w+ \w+):这是由于分号(:) 这是我通过 grok 调试器尝试的%{SYSLOG5424SD:logtime} %{JAVACLASS:loglevel}: (?<API>\w+ \w+ \w+):

i tried uri and other but it did not work, may be due to the colon.我尝试了 uri 和其他但它没有工作,可能是由于结肠。

You can use您可以使用

%{SYSLOG5424SD:logtime} ApiLogger.%{LOGLEVEL:loglevel}: (?<API>\w+ \w+ \w+):\s*%{GREEDYDATA:json_field}

Then, you can parse the json_field with JSON filter .然后,您可以使用JSON 过滤器解析json_field

If you want to play around with regex, you should remember that regex engine parses a string from left to right by default.如果你想玩正则表达式,你应该记住正则表达式引擎默认从左到右解析字符串。 If you want to capture several fields with one regular expression, you should make sure the regex engine can "walk" all the way from one part to another.如果您想用一个正则表达式捕获多个字段,您应该确保正则表达式引擎可以从一个部分一直“行走”到另一个部分。 If you know what patterns there are, what types of chars there are between the two, it is great.如果你知道有什么模式,两者之间有什么类型的字符,那就太好了。 If not, you can only rely on a .* ( %{GREEDYDATA} ) or .*?如果没有,您只能依靠.* ( %{GREEDYDATA} ) 或.*? ( %{DATA} ) patterns. ( %{DATA} ) 模式。

So, as an excercise, you might have a look at所以,作为一个练习,你可以看看

%{SYSLOG5424SD:logtime} %{JAVACLASS:loglevel}: (?<API>\w+ \w+ \w+):\s*\{"endpoint":"(?<endpoint>[^"]*)","http_method":"(?<http_method>[A-Z]++).*?"user_id":(?<user_id>[0-9]++).*?"user_type":(?<user_type>[0-9]++).*?"http_response_code":(?<http_response_code>[0-9]++).*?"response":"(?<response>.*)"

Check the ++ in [0-9]++ and .*?检查[0-9]++ ++的 ++ 和.*? patterns between each field.每个字段之间的模式。 The ++ possessive quantifier make sure the engine does not retry matching with the pattern that is modified by the quantifier again if the subsequent patterns fail to match. ++所有格限定符确保引擎不会在后续模式匹配失败时再次尝试匹配被限定符修改的模式。 The [0-9]++ grabs a sequence of digits and does not give them away and if the subsequent patterns fail, the whole match fails. [0-9]++抓取数字序列并且不会泄露它们,如果后续模式失败,则整个匹配失败。 .*? simply matches any zero or more chars other than line break chars, as few as possible.尽可能少地匹配除换行符以外的任何零个或多个字符。 The last .* is greedy, because it must match as many chars other than line break chars as possible.最后一个.*是贪婪的,因为它必须匹配尽可能的字符而不是换行符。

See the regex demo .请参阅正则表达式演示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM