[英]Java Regex with low letters underscore left, equal middle and any at right side
im reading a file line for line and want only the lines like: 即时通讯读取文件行的行,只希望像这样的行:
gui_test=Hallo
gui_test_hello=Hello
So on the left side are only small letter words separated by _
. 因此,在左侧只有以_
分隔的小字母单词。 In the middle is always the =
and on the right side can be any. 中间总是=
,而右边可以是任意。
I made this regex: 我做了这个正则表达式:
^(([a-z]+_[a-z]+)+=.*)$
The Problem is, that it is working for 问题是,它正在为
gui_pfc_button_ok=Ok!
But not working for: 但不适用于:
gui_pfcuser_opportunitydetails_label_title=label Title
I don't know where my Problem is. 我不知道我的问题在哪里。
The problem you're facing is that regex only captures the last match into the group (at least in most engines; .net is an exception here). 您面临的问题是,正则表达式仅将最后一个匹配项捕获到组中(至少在大多数引擎中; .net在这里是个例外)。 You're using ([az]+_[az]+)+
and, while this is valid, you're only matching the last occurrence that matches this pattern, thus, you're only getting l_title
. 您正在使用([az]+_[az]+)+
,虽然这是有效的,但是您只匹配与该模式匹配的最后一个匹配项,因此,您只会得到l_title
。 Since the match is greedy you're getting l_
because the previous match took as many characters as it could. 由于匹配是贪婪的,因此您会得到l_
因为上一个匹配占用了尽可能多的字符。 Breaking this down you're actually matching in the following way: 分解一下,您实际上可以通过以下方式进行匹配:
gui_pfcuse
r_opportunitydetail
s_labe
l_title
Your current regex will also fail if you only have 1 letter between underscores such as something_here_a_test
. 如果您的下划线之间只有1个字母(例如something_here_a_test
则当前的正则表达式也会失败。 See your regex in use here 在这里查看正在使用的正则表达式
See regex in use here 查看正则表达式在这里使用
^([a-z]+(?:_[a-z]+)*)=(.*)$
You can also use ^((?:[az]+_)*[az]+)=(.*)$
but it's less efficient (uses more steps than the regex above). 您也可以使用^((?:[az]+_)*[az]+)=(.*)$
但效率较低(比上面的正则表达式使用更多的步骤)。
gui_test=Hallo gui_test_hello=Hello
gui_pfcuser_opportunitydetails_label_title=label Title
gui_pfc_button_ok=Ok!
something_here_a_test=More words
gui_test=Hallo gui_test_hello=Hello
匹配: gui_test=Hallo gui_test_hello=Hello
gui_test
第1组: gui_test
Hallo gui_test_hello=Hello
第2组: Hallo gui_test_hello=Hello
gui_pfcuser_opportunitydetails_label_title=label Title
匹配: gui_pfcuser_opportunitydetails_label_title=label Title
gui_pfcuser_opportunitydetails_label_title
第1组: gui_pfcuser_opportunitydetails_label_title
label Title
第2组: label Title
gui_pfc_button_ok=Ok!
匹配: gui_pfc_button_ok=Ok!
gui_pfc_button_ok
第1组: gui_pfc_button_ok
Ok!
第二组: Ok!
something_here_a_test=More words
匹配: something_here_a_test=More words
something_here_a_test
第1组: something_here_a_test
More words
第2组: More words
^
Assert position at the start of the line ^
在行首处声明位置 ([az]+(?:_[az]+)*)
Capture the following into capture group 1 ([az]+(?:_[az]+)*)
将以下内容捕获到捕获组1中
[az]+
Match any lowercase ASCII letter one or more times [az]+
匹配任何小写ASCII字母一次或多次 (?:_[az]+)*
Match the following any number of times. (?:_[az]+)*
匹配以下任意次数。 If you require at least one match you can change *
to +
such that you end up with (?:_[az]+)+
如果您需要至少一场比赛,则可以将*
更改为+
,以得到(?:_[az]+)+
_
Match this literally _
从字面上匹配 [az]+
Match any lowercase ASCII letter one or more times [az]+
匹配任何小写ASCII字母一次或多次 =
Match this literally =
从字面上匹配 (.*)
Capture any character (except newline characters) into capture group 2 (.*)
任何字符(换行符除外)捕获到捕获组2中 $
Assert position at the end of the line $
在行尾声明位置 ^(([a-z]+_)+[a-z]+\=.*) $
Tested on https://regex101.com/ where the various parts are: 在https://regex101.com/上进行了测试,其中的各个部分如下:
^ Asserts position at start of a line
([a-z]+_)+ One or more sequences of the form <one_or_more_lower_case>_
[a-z]+ One or more lower case letter
\= escaped equals sign
.* zero or more characters
$ end of line
Just move your underscore into the character class brackets, and you're home. 只需将下划线移到角色类括号中,就可以回家了。 You can remove the + for the inner group as well if you want... 如果需要,也可以删除内部组的+。
^(([a-z]+[_a-z]+)=.*)$
If you want, you can try it out at regex101.com . 如果需要,可以在regex101.com上尝试。
If you don't want to match lines with two consecutive underscores, you can group on the underscore part, in pseudocode xxx(_xxx)*
, which would then become 如果您不想将连续两个下划线的行进行匹配,则可以使用伪代码xxx(_xxx)*
在下划线部分进行xxx(_xxx)*
,然后将其变为
^([a-z]+(_[a-z]+)*+=.*)$
Also testable at regex101.com . 也可以在regex101.com上进行测试。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.