简体   繁体   English

(Python)正则表达式:匹配不在(静态)字符串列表中的所有内容

[英](Python) regex: Match everything which is NOT in a (static) list of string

Let's say I have the (static) list ['DOG', 'CAT', 'LEOPARD'] (strings of possible different lengths). 假设我有(静态)列表['DOG', 'CAT', 'LEOPARD'] (可能不同长度的字符串)。

I know how to construct the regular expression that catches pairs of comma-separated animals that belongs to this list: 我知道如何构造正则表达式,以捕获属于此列表的成对逗号分隔的动物:

from re import search
search('^(DOG|CAT|LEOPARD),(DOG|CAT|LEOPARD)$', 'DOG,LEOPARD') #-> Match
search('^(DOG|CAT|LEOPARD),(DOG|CAT|LEOPARD)$', 'LEOPARD,WHALE') #-> No match

I now want a regular expression that matches pairs of animal where none of them belong to my animal list. 我现在想要一个匹配成对动物的正则表达式,其中没有一个属于我的动物列表。 Using the dummy operator ! 使用虚拟运算符! , what I want is: ,我想要的是:

from re import search
search('^!(DOG|CAT|LEOPARD),!(DOG|CAT|LEOPARD)$', 'DOG,LEOPARD') #-> No match
search('^!(DOG|CAT|LEOPARD),!(DOG|CAT|LEOPARD)$', 'CHIMP,WHALE') #-> Match

Does such an operator exist ? 是否存在这样的运算符?

If not, is there a simple way to construct such an operator by chaining existing ones (I am writing a regular expression constructor, so neither readability nor length of the regex are important factors here) ? 如果不是,是否有一种简单的方法可以通过链接现有的操作符来构造这样的运算符(我正在编写正则表达式构造函数,因此,可读性和regex的长度都不是重要因素)?

Note : I am aware that I am asking a lot to my regular expression engine. 注意 :我知道我向正则表达式引擎提出了很多要求。

Note 2 : I am not interested in solutions that do not rely on regular expression, as this problem integrates in a much larger one that I am already solving with (very complex) regular expressions. 注2 :我对不依赖于正则表达式的解决方案不感兴趣,因为此问题与我已经使用(非常复杂的)正则表达式解决的一个更大的问题集成在一起。

Instead of doing this using regex, you can use sets and test the intersection: 除了使用正则表达式,您还可以使用集合并测试交集:

>>> a = set(['DOG', 'CAT', 'LEOPARD'])
>>> b = set('DOG,LEOPARD'.split(','))
>>> True if a.intersection(b) else False

Why not use strings and built-in functions instead of regular expressions? 为什么不使用字符串和内置函数代替正则表达式?

def matcher(no, s):
    return not any(word in no for word in set(s.split(',')))

Result: 结果:

>>> matcher({'DOG', 'CAT', 'LEOPARD'}, 'DOG,LEOPARD')
False
>>> matcher({'DOG', 'CAT', 'LEOPARD'}, 'CHIMP,WHALE')
True

You're looking for lookarounds : 您正在寻找环顾四周

^(?!(?:DOG|CAT|LEOPARD),)[^,]+,(?!(?:DOG|CAT|LEOPARD)$)[^,]+$

Pattern breakdown: 模式细分:

^     assert position at start of string
(?!   assert the following text does NOT match...
    (?:DOG|CAT|LEOPARD) ...one of these 3 words...
    ,   ...followed by a comma. The comma is essential, because it makes sure that the text
           IS dog or cat or leopard. Without the comma, the regex would check if the text
           STARTS WITH dog, cat or leopard.
)
[^,]+   if we've reached this point, we know the animal isn't cat, dog or leopard. Match up
        until the next comma.
,       consume the comma
(?!     same as before, except this time...
    (?:DOG|CAT|LEOPARD)
    $   ...assert end of string instead of comma
)
[^,]+
$

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM