I am reading in lines from a file, and I want to remove lines that only contain letters, colon, parentheses, underscores, spaces and backslashes. This regex was working fine to find those lines...
[^A-Za-z0-9:()_\s\\]
...as passed to re.search() as a raw string.
Now, I need to add to it that the lines cannot start with THEN or ELSE; otherwise they should not match and thus be exempted from being removed.
I tried just taking the ^ out of the brackets and adding a negative lookbehind before the bracketed expression, like so...
r'^(?!(ELSE|THEN))[A-Za-z0-9:()_\s\\]'
...but now it just matches every line. What am I missing?
Just use an alternation:
^(?:THEN|ELSE|[A-Za-z0-9:()_\s\\]*$)
and remove the lines that don't match the pattern.
^(?:(?:.*[^A-Za-z0-9:()_\s\\])|(?:THEN|ELSE)).*$
Broken down
^(?: ).*$ # Starts with
(?: )|(?: ) # Either
.*[^A-Za-z0-9:()_\s\\] # Anything that contains a non-alphanumeric character
THEN|ELSE # THEN/ELSE
See the example on regex101.com
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.