简体   繁体   English

正则表达式组表达式不捕获内容

[英]Regex groups expression not capturing content

I'm trying to create a large regex expression where the plan is to capture 6 groups.我正在尝试创建一个大型正则表达式,其中计划是捕获 6 个组。 Is gonna be used to parse some Android log that have the following format:将用于解析一些具有以下格式的 Android 日志:

2020-03-10T14:09:13.3250000 VERB    CallingClass    17503   20870   Whatever content: this log line had (etc)

The expression I've created so far is the following:到目前为止,我创建的表达式如下:

    (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w{+})\t(\d{5})\t(\d{5})\t(.*$)

The lines in this case are Tab separated, although the application that I'm developing will be dynamic to the point where this is not always the case, so regex I feel is still the best option even if heavier then performing a split.在这种情况下,行是制表符分隔的,尽管我正在开发的应用程序将是动态的,但情况并非总是如此,所以我觉得正则表达式仍然是最好的选择,即使比执行拆分更重。

Breaking down the groups in more detail from my though process:从我的过程中更详细地分解组:

  1. Matches the date (I'm considering changing this to ax number of characters instead)匹配日期(我正在考虑将其更改为 ax 字符数)

    (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7}) (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})

  2. Match a block of 4 characters匹配 4 个字符的块

    ([A-Za-z]{4}) ([A-Z-Z]{4})

  3. Match any number of characters until the next tab匹配任意数量的字符,直到下一个制表符

    (\w{+}) (\w{+})

  4. Match a block of 5 numbers 2 times匹配 5 个数字的块 2 次

    \t(\d{5}) \t(\d{5})

  5. At last, match everything else until the end of the line.最后,匹配其他所有内容,直到行尾。 \t(.*$) \t(.*$)

If I use a reduced expression to the following it works:如果我对以下内容使用简化的表达式,则可以:

    (\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(.*$)

This doesn't include 3 of the groups, the word and the 2 numbers blocks.这不包括 3 个组、单词和 2 个数字块。

Any idea why is this?知道这是为什么吗?

Thank you.谢谢你。

The problem is \w{+} is going to match a word character followed by one or more { characters and then a final } character.问题是\w{+}将匹配一个单词字符,后跟一个或多个 { 字符,然后是最后一个 } 字符。 If you want one or more word characters then just use plus without the curly braces (which are meant for specifying a specific number or number range, but will match literal curly braces if they do not adhere to that format).如果您想要一个或多个单词字符,则只需使用不带花括号的加号(用于指定特定数字或数字范围,但如果它们不符合该格式,则将匹配文字花括号)。

(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.\d{7})\t([A-Za-z]{4})\t(\w+)\t(\d{5})\t(\d{5})\t(.*$)

I highly recommend using https://regex101.com/ for the explanation to see if your expression matches up with what you want spelled out in words.我强烈建议使用https://regex101.com/进行解释,以查看您的表达是否与您想要用文字拼写的内容相匹配。 However for testing for use in C# you should use something else like http://regexstorm.net/tester但是,对于在 C# 中使用的测试,您应该使用其他类似http://regexstorm.net/tester

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM