繁体   English   中英

如何在 python 中使用正则表达式忽略以特定模式开头的字符串?

[英]How to ignore strings that start with certain pattern using regular expression in python?

接受并返回@something 但拒绝first@last。

r'@([A-Z][A-Z0-9_]*[A-Z0-9])

上面的正则表达式将接受@something(以字母开头,以字母或数字结尾,中间可能有下划线,至少2个字符)并返回@符号后面的部分。

我不想在@符号之前返回包含一些字母或数字A-Z0-9的字符串。

允许@之前的空格、换行符、特殊字符等。

代码:

re.findall(r'@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I)

利用

re.findall(r'(?<![A-Z0-9])@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I)

请参阅正则表达式证明

解释

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    [A-Z0-9]                 any character of: 'A' to 'Z', '0' to '9'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  @                        '@'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [A-Z]                    any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
    [A-Z0-9_]*               any character of: 'A' to 'Z', '0' to
                             '9', '_' (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    [A-Z0-9]                 any character of: 'A' to 'Z', '0' to '9'
--------------------------------------------------------------------------------
  )                        end of \1

您可以使用

\B@([A-Z][A-Z0-9_]*[A-Z0-9])

模式匹配:

  • \B断言一个字边界不匹配的 position
  • @字面上匹配
  • (捕获组 1
    • [AZ][A-Z0-9_]*[A-Z0-9]
  • )关闭第 1 组

正则表达式演示

import re

text = "Accept and return @something but reject first@last."
print(re.findall(r'\B@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I))

Output

['something']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM