简体   繁体   中英

How to ignore strings that start with certain pattern using regular expression in python?

Accept and return @something but reject first@last.

r'@([A-Z][A-Z0-9_]*[A-Z0-9])

The above regexp will accept @something (starts with letter, ends with letter or number, may have underscore in middle, atleast 2 characters long) and returns the part after the @ symbol.

I do not want to return strings which contain some letters or number A-Z0-9 before the @ symbol.

Spaces, new lines, special characters, etc before @ is allowed.

CODE:

re.findall(r'@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I)

Use

re.findall(r'(?<![A-Z0-9])@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I)

See regex proof .

EXPLANATION

--------------------------------------------------------------------------------
  (?<!                     look behind to see if there is not:
--------------------------------------------------------------------------------
    [A-Z0-9]                 any character of: 'A' to 'Z', '0' to '9'
--------------------------------------------------------------------------------
  )                        end of look-behind
--------------------------------------------------------------------------------
  @                        '@'
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [A-Z]                    any character of: 'A' to 'Z'
--------------------------------------------------------------------------------
    [A-Z0-9_]*               any character of: 'A' to 'Z', '0' to
                             '9', '_' (0 or more times (matching the
                             most amount possible))
--------------------------------------------------------------------------------
    [A-Z0-9]                 any character of: 'A' to 'Z', '0' to '9'
--------------------------------------------------------------------------------
  )                        end of \1

You can use

\B@([A-Z][A-Z0-9_]*[A-Z0-9])

The pattern matches:

  • \B Assert a position where a word boundary does not match
  • @ Match literally
  • ( Capture group 1
    • [AZ][A-Z0-9_]*[A-Z0-9]
  • ) Close group 1

Regex demo

import re

text = "Accept and return @something but reject first@last."
print(re.findall(r'\B@([A-Z][A-Z0-9_]*[A-Z0-9])', text, re.I))

Output

['something']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM