简体   繁体   中英

(Python) regex: Match everything which is NOT in a (static) list of string

Let's say I have the (static) list ['DOG', 'CAT', 'LEOPARD'] (strings of possible different lengths).

I know how to construct the regular expression that catches pairs of comma-separated animals that belongs to this list:

from re import search
search('^(DOG|CAT|LEOPARD),(DOG|CAT|LEOPARD)$', 'DOG,LEOPARD') #-> Match
search('^(DOG|CAT|LEOPARD),(DOG|CAT|LEOPARD)$', 'LEOPARD,WHALE') #-> No match

I now want a regular expression that matches pairs of animal where none of them belong to my animal list. Using the dummy operator ! , what I want is:

from re import search
search('^!(DOG|CAT|LEOPARD),!(DOG|CAT|LEOPARD)$', 'DOG,LEOPARD') #-> No match
search('^!(DOG|CAT|LEOPARD),!(DOG|CAT|LEOPARD)$', 'CHIMP,WHALE') #-> Match

Does such an operator exist ?

If not, is there a simple way to construct such an operator by chaining existing ones (I am writing a regular expression constructor, so neither readability nor length of the regex are important factors here) ?

Note : I am aware that I am asking a lot to my regular expression engine.

Note 2 : I am not interested in solutions that do not rely on regular expression, as this problem integrates in a much larger one that I am already solving with (very complex) regular expressions.

Instead of doing this using regex, you can use sets and test the intersection:

>>> a = set(['DOG', 'CAT', 'LEOPARD'])
>>> b = set('DOG,LEOPARD'.split(','))
>>> True if a.intersection(b) else False

Why not use strings and built-in functions instead of regular expressions?

def matcher(no, s):
    return not any(word in no for word in set(s.split(',')))

Result:

>>> matcher({'DOG', 'CAT', 'LEOPARD'}, 'DOG,LEOPARD')
False
>>> matcher({'DOG', 'CAT', 'LEOPARD'}, 'CHIMP,WHALE')
True

You're looking for lookarounds :

^(?!(?:DOG|CAT|LEOPARD),)[^,]+,(?!(?:DOG|CAT|LEOPARD)$)[^,]+$

Pattern breakdown:

^     assert position at start of string
(?!   assert the following text does NOT match...
    (?:DOG|CAT|LEOPARD) ...one of these 3 words...
    ,   ...followed by a comma. The comma is essential, because it makes sure that the text
           IS dog or cat or leopard. Without the comma, the regex would check if the text
           STARTS WITH dog, cat or leopard.
)
[^,]+   if we've reached this point, we know the animal isn't cat, dog or leopard. Match up
        until the next comma.
,       consume the comma
(?!     same as before, except this time...
    (?:DOG|CAT|LEOPARD)
    $   ...assert end of string instead of comma
)
[^,]+
$

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM