Python 正则表达式检测字母之间的下划线

Question

How do I make a regex in python that returns a string with all underscores between lowercase letters?如何在 python 中创建一个正则表达式，它返回一个所有下划线都在小写字母之间的字符串？ For example, it should detect and return: 'aa_bb_cc', 'swd_qq', 'hello_there_friend'例如，它应该检测并返回： 'aa_bb_cc', 'swd_qq', 'hello_there_friend'

But it should not return these: 'aA_bb', 'aa_', '_ddQ', 'aa_baa_2cs'但它不应该返回这些： 'aA_bb', 'aa_', '_ddQ', 'aa_baa_2cs'

My code is ([az]+_[az]+)+ , but it returns only one underscore.我的代码是([az]+_[az]+)+ ，但它只返回一个下划线。 It should return all underscores seperated by lowercase letters.它应该返回由小写字母分隔的所有下划线。

For example, when I pass the string "aab_cbbbc_vv" , it returns only 'aab_cbbbc' instead of 'aab_cbbbc_vv'例如，当我传递字符串"aab_cbbbc_vv"时，它只返回'aab_cbbbc'而不是'aab_cbbbc_vv'

Thank you谢谢

Answer 1

Your regex is almost correct.您的正则表达式几乎是正确的。 If you change it to:如果您将其更改为：

^([a-z]+)(_[a-z]+)+$

It woks as you can check here .它可以在这里查看。

^ - matches the beginning of the string ^ - 匹配字符串的开头

$ - the end of the string $ - 字符串的结尾

You need these so that you are not getting partial matches when matching the strings you don't want to get matched.您需要这些，以便在匹配您不想匹配的字符串时不会得到部分匹配。

Answer 2

The reason that you get only results with 1 underscore for your example data is that ([az]+_[az]+)+ repeats a match of [az]+, then an underscore and then again [az]+您的示例数据只得到带有 1 个下划线的结果的原因是([az]+_[az]+)+重复 [az]+ 的匹配，然后是下划线，然后是 [az]+

That would for example match a_b or a_bc_d , but only a partial match for a_b_c as there has to be at least a char az present before each _ for every iteration.例如，这将匹配a_b或a_bc_d ，但只匹配a_b_c的部分匹配，因为每次迭代的每个 _ 之前必须至少存在一个字符 az 。

You could update your pattern to:您可以将模式更新为：

\b[a-z]+(?:_[a-z]+)+\b

Explanation解释

\b A word boundary \b一个词的边界
[az]+ Match 1+ chars az [az]+匹配 1+ 个字符 az
(?:_[az]+)+ Repeat 1+ times matching _ and 1+ chars az (?:_[az]+)+重复 1+ 次匹配_和 1+ 字符 az
\b A word boundary \b一个词的边界

regex demo正则表达式演示

Answer 3

try this code to get it试试这个代码来得到它

import re
s = "aa_bb_cc swd_qq hello_there_friend aA_bb aa_ _ddQ aa_baa_2cs"
print(re.findall(r"[a-z][a-z_]+\_[a-z]+",s))

the output sould be output 应该是

['aa_bb_cc', 'swd_qq', 'hello_there_friend', 'aa_baa']

Python 正则表达式检测字母之间的下划线

问题描述

3 个解决方案

解决方案1
3 已采纳 2021-01-14 06:26:39

解决方案2
1 2021-01-14 08:01:33

解决方案3
0 2021-01-14 06:26:58

Python 正则表达式检测字母之间的下划线

问题描述

3 个解决方案

解决方案1 3 已采纳 2021-01-14 06:26:39

解决方案2 1 2021-01-14 08:01:33

解决方案3 0 2021-01-14 06:26:58

解决方案1
3 已采纳 2021-01-14 06:26:39

解决方案2
1 2021-01-14 08:01:33

解决方案3
0 2021-01-14 06:26:58