简体   繁体   中英

Python regex: Line can't start with certain words, can only contain certain characters

I am reading in lines from a file, and I want to remove lines that only contain letters, colon, parentheses, underscores, spaces and backslashes. This regex was working fine to find those lines...

[^A-Za-z0-9:()_\s\\]

...as passed to re.search() as a raw string.

Now, I need to add to it that the lines cannot start with THEN or ELSE; otherwise they should not match and thus be exempted from being removed.

I tried just taking the ^ out of the brackets and adding a negative lookbehind before the bracketed expression, like so...

r'^(?!(ELSE|THEN))[A-Za-z0-9:()_\s\\]'

...but now it just matches every line. What am I missing?

Just use an alternation:

^(?:THEN|ELSE|[A-Za-z0-9:()_\s\\]*$)

and remove the lines that don't match the pattern.

^(?:(?:.*[^A-Za-z0-9:()_\s\\])|(?:THEN|ELSE)).*$

Broken down

^(?:                                        ).*$  #  Starts with
    (?:                      )|(?:         )      #  Either
       .*[^A-Za-z0-9:()_\s\\]                     #  Anything that contains a non-alphanumeric character
                                  THEN|ELSE       #  THEN/ELSE

See the example on regex101.com

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM