[英]Python Regex to detect underscore between letters
How do I make a regex in python that returns a string with all underscores between lowercase letters?如何在 python 中创建一个正则表达式,它返回一个所有下划线都在小写字母之间的字符串? For example, it should detect and return:
'aa_bb_cc', 'swd_qq', 'hello_there_friend'
例如,它应该检测并返回:
'aa_bb_cc', 'swd_qq', 'hello_there_friend'
But it should not return these: 'aA_bb', 'aa_', '_ddQ', 'aa_baa_2cs'
但它不应该返回这些:
'aA_bb', 'aa_', '_ddQ', 'aa_baa_2cs'
My code is ([az]+_[az]+)+
, but it returns only one underscore.我的代码是
([az]+_[az]+)+
,但它只返回一个下划线。 It should return all underscores seperated by lowercase letters.它应该返回由小写字母分隔的所有下划线。
For example, when I pass the string "aab_cbbbc_vv"
, it returns only 'aab_cbbbc'
instead of 'aab_cbbbc_vv'
例如,当我传递字符串
"aab_cbbbc_vv"
时,它只返回'aab_cbbbc'
而不是'aab_cbbbc_vv'
Thank you谢谢
Your regex is almost correct.您的正则表达式几乎是正确的。 If you change it to:
如果您将其更改为:
^([a-z]+)(_[a-z]+)+$
It woks as you can check here .它可以在这里查看。
^
- matches the beginning of the string ^
- 匹配字符串的开头
$
- the end of the string $
- 字符串的结尾
You need these so that you are not getting partial matches when matching the strings you don't want to get matched.您需要这些,以便在匹配您不想匹配的字符串时不会得到部分匹配。
The reason that you get only results with 1 underscore for your example data is that ([az]+_[az]+)+
repeats a match of [az]+, then an underscore and then again [az]+您的示例数据只得到带有 1 个下划线的结果的原因是
([az]+_[az]+)+
重复 [az]+ 的匹配,然后是下划线,然后是 [az]+
That would for example match a_b
or a_bc_d
, but only a partial match for a_b_c
as there has to be at least a char az present before each _ for every iteration.例如,这将匹配
a_b
或a_bc_d
,但只匹配a_b_c
的部分匹配,因为每次迭代的每个 _ 之前必须至少存在一个字符 az 。
You could update your pattern to:您可以将模式更新为:
\b[a-z]+(?:_[a-z]+)+\b
Explanation解释
\b
A word boundary \b
一个词的边界[az]+
Match 1+ chars az [az]+
匹配 1+ 个字符 az(?:_[az]+)+
Repeat 1+ times matching _
and 1+ chars az (?:_[az]+)+
重复 1+ 次匹配_
和 1+ 字符 az\b
A word boundary \b
一个词的边界try this code to get it试试这个代码来得到它
import re
s = "aa_bb_cc swd_qq hello_there_friend aA_bb aa_ _ddQ aa_baa_2cs"
print(re.findall(r"[a-z][a-z_]+\_[a-z]+",s))
the output sould be output 应该是
['aa_bb_cc', 'swd_qq', 'hello_there_friend', 'aa_baa']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.