简体   繁体   English

Java Regex的低位字母下划线,左下划线,中间等于下划线,右下划线

[英]Java Regex with low letters underscore left, equal middle and any at right side

im reading a file line for line and want only the lines like: 即时通讯读取文件行的​​行,只希望像这样的行:

gui_test=Hallo
gui_test_hello=Hello

So on the left side are only small letter words separated by _ . 因此,在左侧只有以_分隔的小字母单词。 In the middle is always the = and on the right side can be any. 中间总是= ,而右边可以是任意。

I made this regex: 我做了这个正则表达式:

^(([a-z]+_[a-z]+)+=.*)$

The Problem is, that it is working for 问题是,它正在为

gui_pfc_button_ok=Ok!

But not working for: 但不适用于:

gui_pfcuser_opportunitydetails_label_title=label Title

I don't know where my Problem is. 我不知道我的问题在哪里。

Brief 简要

The problem you're facing is that regex only captures the last match into the group (at least in most engines; .net is an exception here). 您面临的问题是,正则表达式仅将最后一个匹配项捕获到组中(至少在大多数引擎中; .net在这里是个例外)。 You're using ([az]+_[az]+)+ and, while this is valid, you're only matching the last occurrence that matches this pattern, thus, you're only getting l_title . 您正在使用([az]+_[az]+)+ ,虽然这是有效的,但是您只匹配与该模式匹配的最后一个匹配项,因此,您只会得到l_title Since the match is greedy you're getting l_ because the previous match took as many characters as it could. 由于匹配是贪婪的,因此您会得到l_因为上一个匹配占用了尽可能多的字符。 Breaking this down you're actually matching in the following way: 分解一下,您实际上可以通过以下方式进行匹配:

  • gui_pfcuse
  • r_opportunitydetail
  • s_labe
  • l_title

Your current regex will also fail if you only have 1 letter between underscores such as something_here_a_test . 如果您的下划线之间只有1个字母(例如something_here_a_test则当前的正则表达式也会失败。 See your regex in use here 这里查看正在使用的正则表达式


Code

See regex in use here 查看正则表达式在这里使用

^([a-z]+(?:_[a-z]+)*)=(.*)$

You can also use ^((?:[az]+_)*[az]+)=(.*)$ but it's less efficient (uses more steps than the regex above). 您也可以使用^((?:[az]+_)*[az]+)=(.*)$但效率较低(比上面的正则表达式使用更多的步骤)。


Results 结果

Input 输入项

gui_test=Hallo gui_test_hello=Hello
gui_pfcuser_opportunitydetails_label_title=label Title
gui_pfc_button_ok=Ok!
something_here_a_test=More words

Output 输出量

  1. Match: gui_test=Hallo gui_test_hello=Hello 匹配: gui_test=Hallo gui_test_hello=Hello
    • Group 1: gui_test 第1组: gui_test
    • Group 2: Hallo gui_test_hello=Hello 第2组: Hallo gui_test_hello=Hello
  2. Match: gui_pfcuser_opportunitydetails_label_title=label Title 匹配: gui_pfcuser_opportunitydetails_label_title=label Title
    • Group 1: gui_pfcuser_opportunitydetails_label_title 第1组: gui_pfcuser_opportunitydetails_label_title
    • Group 2: label Title 第2组: label Title
  3. Match: gui_pfc_button_ok=Ok! 匹配: gui_pfc_button_ok=Ok!
    • Group 1: gui_pfc_button_ok 第1组: gui_pfc_button_ok
    • Group 2: Ok! 第二组: Ok!
  4. Match: something_here_a_test=More words 匹配: something_here_a_test=More words
    • Group 1: something_here_a_test 第1组: something_here_a_test
    • Group 2: More words 第2组: More words

Explanation 说明

  • ^ Assert position at the start of the line ^在行首处声明位置
  • ([az]+(?:_[az]+)*) Capture the following into capture group 1 ([az]+(?:_[az]+)*)将以下内容捕获到捕获组1中
    • [az]+ Match any lowercase ASCII letter one or more times [az]+匹配任何小写ASCII字母一次或多次
    • (?:_[az]+)* Match the following any number of times. (?:_[az]+)*匹配以下任意次数。 If you require at least one match you can change * to + such that you end up with (?:_[az]+)+ 如果您需要至少一场比赛,则可以将*更改为+ ,以得到(?:_[az]+)+
      • _ Match this literally _从字面上匹配
      • [az]+ Match any lowercase ASCII letter one or more times [az]+匹配任何小写ASCII字母一次或多次
  • = Match this literally =从字面上匹配
  • (.*) Capture any character (except newline characters) into capture group 2 (.*)任何字符(换行符除外)捕获到捕获组2中
  • $ Assert position at the end of the line $在行尾声明位置
^(([a-z]+_)+[a-z]+\=.*) $

Tested on https://regex101.com/ where the various parts are: https://regex101.com/上进行了测试,其中的各个部分如下:

^          Asserts position at start of a line
([a-z]+_)+ One or more sequences of the form <one_or_more_lower_case>_
[a-z]+     One or more lower case letter
\= escaped equals sign
.*         zero or more characters
$          end of line

Just move your underscore into the character class brackets, and you're home. 只需将下划线移到角色类括号中,就可以回家了。 You can remove the + for the inner group as well if you want... 如果需要,也可以删除内部组的+。

^(([a-z]+[_a-z]+)=.*)$

If you want, you can try it out at regex101.com . 如果需要,可以在regex101.com上尝试。

If you don't want to match lines with two consecutive underscores, you can group on the underscore part, in pseudocode xxx(_xxx)* , which would then become 如果您不想将连续两个下划线的行进行匹配,则可以使用伪代码xxx(_xxx)*在下划线部分进行xxx(_xxx)* ,然后将其变为

^([a-z]+(_[a-z]+)*+=.*)$

Also testable at regex101.com . 也可以在regex101.com上进行测试。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM