Java Regex的低位字母下划线，左下划线，中间等于下划线，右下划线

Question

im reading a file line for line and want only the lines like: 即时通讯读取文件行的行，只希望像这样的行：

gui_test=Hallo
gui_test_hello=Hello

So on the left side are only small letter words separated by _ . 因此，在左侧只有以_分隔的小字母单词。 In the middle is always the = and on the right side can be any. 中间总是= ，而右边可以是任意。

I made this regex: 我做了这个正则表达式：

^(([a-z]+_[a-z]+)+=.*)$

The Problem is, that it is working for 问题是，它正在为

gui_pfc_button_ok=Ok!

But not working for: 但不适用于：

gui_pfcuser_opportunitydetails_label_title=label Title

I don't know where my Problem is. 我不知道我的问题在哪里。

Answer 1

Brief 简要

The problem you're facing is that regex only captures the last match into the group (at least in most engines; .net is an exception here). 您面临的问题是，正则表达式仅将最后一个匹配项捕获到组中（至少在大多数引擎中； .net在这里是个例外）。 You're using ([az]+_[az]+)+ and, while this is valid, you're only matching the last occurrence that matches this pattern, thus, you're only getting l_title . 您正在使用([az]+_[az]+)+ ，虽然这是有效的，但是您只匹配与该模式匹配的最后一个匹配项，因此，您只会得到l_title 。 Since the match is greedy you're getting l_ because the previous match took as many characters as it could. 由于匹配是贪婪的，因此您会得到l_因为上一个匹配占用了尽可能多的字符。 Breaking this down you're actually matching in the following way: 分解一下，您实际上可以通过以下方式进行匹配：

gui_pfcuse
r_opportunitydetail
s_labe
l_title

Your current regex will also fail if you only have 1 letter between underscores such as something_here_a_test . 如果您的下划线之间只有1个字母（例如something_here_a_test则当前的正则表达式也会失败。 See your regex in use here 在这里查看正在使用的正则表达式

Code 码

See regex in use here 查看正则表达式在这里使用

^([a-z]+(?:_[a-z]+)*)=(.*)$

You can also use ^((?:[az]+_)*[az]+)=(.*)$ but it's less efficient (uses more steps than the regex above). 您也可以使用^((?:[az]+_)*[az]+)=(.*)$但效率较低（比上面的正则表达式使用更多的步骤）。

Results 结果

Input 输入项

gui_test=Hallo gui_test_hello=Hello
gui_pfcuser_opportunitydetails_label_title=label Title
gui_pfc_button_ok=Ok!
something_here_a_test=More words

Output 输出量

Match: gui_test=Hallo gui_test_hello=Hello 匹配： gui_test=Hallo gui_test_hello=Hello
- Group 1: gui_test 第1组： gui_test
- Group 2: Hallo gui_test_hello=Hello 第2组： Hallo gui_test_hello=Hello
Match: gui_pfcuser_opportunitydetails_label_title=label Title 匹配： gui_pfcuser_opportunitydetails_label_title=label Title
- Group 1: gui_pfcuser_opportunitydetails_label_title 第1组： gui_pfcuser_opportunitydetails_label_title
- Group 2: label Title 第2组： label Title
Match: gui_pfc_button_ok=Ok! 匹配： gui_pfc_button_ok=Ok!
- Group 1: gui_pfc_button_ok 第1组： gui_pfc_button_ok
- Group 2: Ok! 第二组： Ok!
Match: something_here_a_test=More words 匹配： something_here_a_test=More words
- Group 1: something_here_a_test 第1组： something_here_a_test
- Group 2: More words 第2组： More words

Explanation 说明

^ Assert position at the start of the line ^在行首处声明位置
([az]+(?:_[az]+)*) Capture the following into capture group 1 ([az]+(?:_[az]+)*)将以下内容捕获到捕获组1中
- [az]+ Match any lowercase ASCII letter one or more times [az]+匹配任何小写ASCII字母一次或多次
- (?:_[az]+)* Match the following any number of times. (?:_[az]+)*匹配以下任意次数。 If you require at least one match you can change * to + such that you end up with (?:_[az]+)+ 如果您需要至少一场比赛，则可以将*更改为+ ，以得到(?:_[az]+)+
  - _ Match this literally _从字面上匹配
  - [az]+ Match any lowercase ASCII letter one or more times [az]+匹配任何小写ASCII字母一次或多次
= Match this literally =从字面上匹配
(.*) Capture any character (except newline characters) into capture group 2 (.*)任何字符（换行符除外）捕获到捕获组2中
$ Assert position at the end of the line $在行尾声明位置

Answer 2

^(([a-z]+_)+[a-z]+\=.*) $

Tested on https://regex101.com/ where the various parts are: 在https://regex101.com/上进行了测试，其中的各个部分如下：

^          Asserts position at start of a line
([a-z]+_)+ One or more sequences of the form <one_or_more_lower_case>_
[a-z]+     One or more lower case letter
\= escaped equals sign
.*         zero or more characters
$          end of line

Answer 3

Just move your underscore into the character class brackets, and you're home. 只需将下划线移到角色类括号中，就可以回家了。 You can remove the + for the inner group as well if you want... 如果需要，也可以删除内部组的+。

^(([a-z]+[_a-z]+)=.*)$

If you want, you can try it out at regex101.com . 如果需要，可以在regex101.com上尝试。

If you don't want to match lines with two consecutive underscores, you can group on the underscore part, in pseudocode xxx(_xxx)* , which would then become 如果您不想将连续两个下划线的行进行匹配，则可以使用伪代码xxx(_xxx)*在下划线部分进行xxx(_xxx)* ，然后将其变为

^([a-z]+(_[a-z]+)*+=.*)$

Also testable at regex101.com . 也可以在regex101.com上进行测试。

Java Regex的低位字母下划线，左下划线，中间等于下划线，右下划线

问题描述

3 个解决方案

解决方案1
3 已采纳 2017-12-18 15:20:34

Brief 简要

Code 码

Results 结果

Input 输入项

Output 输出量

Explanation 说明

解决方案2
2 2017-12-18 15:19:47

解决方案3
2 2017-12-18 15:24:12

Java Regex的低位字母下划线，左下划线，中间等于下划线，右下划线

问题描述

3 个解决方案

解决方案1 3 已采纳 2017-12-18 15:20:34

Brief 简要

Code 码

Results 结果

Input 输入项

Output 输出量

Explanation 说明

解决方案2 2 2017-12-18 15:19:47

解决方案3 2 2017-12-18 15:24:12

解决方案1
3 已采纳 2017-12-18 15:20:34

解决方案2
2 2017-12-18 15:19:47

解决方案3
2 2017-12-18 15:24:12