[英]How to ignore strings that start with certain pattern using regular expression in python?
Accept and return @something but reject first@last.接受并返回@something 但拒绝first@last。
r'@([A-Z][A-Z0-9_]*[A-Z0-9])
The above regexp will accept @something (starts with letter, ends with letter or number, may have underscore in middle, atleast 2 characters long) and returns the part after the @
symbol.上面的正则表达式将接受@something(以字母开头,以字母或数字结尾,中间可能有下划线,至少2个字符)并返回@
符号后面的部分。
I do not want to return strings which contain some letters or number A-Z0-9
before the @
symbol.我不想在@
符号之前返回包含一些字母或数字A-Z0-9
的字符串。
Spaces, new lines, special characters, etc before @
is allowed.允许@
之前的空格、换行符、特殊字符等。
CODE:代码:
re.findall(r'@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I)
Use利用
re.findall(r'(?<![A-Z0-9])@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I)
See regex proof .请参阅正则表达式证明。
EXPLANATION解释
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
[A-Z0-9] any character of: 'A' to 'Z', '0' to '9'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
@ '@'
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
[A-Z] any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
[A-Z0-9_]* any character of: 'A' to 'Z', '0' to
'9', '_' (0 or more times (matching the
most amount possible))
--------------------------------------------------------------------------------
[A-Z0-9] any character of: 'A' to 'Z', '0' to '9'
--------------------------------------------------------------------------------
) end of \1
You can use您可以使用
\B@([A-Z][A-Z0-9_]*[A-Z0-9])
The pattern matches:模式匹配:
\B
Assert a position where a word boundary does not match \B
断言一个字边界不匹配的 position@
Match literally @
字面上匹配(
Capture group 1 (
捕获组 1
[AZ][A-Z0-9_]*[A-Z0-9]
)
Close group 1 )
关闭第 1 组import re
text = "Accept and return @something but reject first@last."
print(re.findall(r'\B@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I))
Output Output
['something']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.