[英]regex: find all words with certain letters but not other
can anyone is help me with that: 谁能帮助我:
I need to find all words from list containing letters [t OR d] AND [k OR c] but not any of [s,z,n,m] 我需要从包含字母[t OR d]和[k OR c]的列表中查找所有单词,但不包含[s,z,n,m]中的任何一个
I figured out first part, but don't know how to include stop list: 我已经弄清楚了第一部分,但是不知道如何包括停止列表:
\w*[t|d]\w*[k|c]\w*
in Python notation 用Python表示法
Thank you in advance 先感谢您
You can use 2 steps. 您可以使用2个步骤。 First find t|d AND k|c, then filter out matches with unwanted letters.
首先找到t | d AND k | c,然后过滤掉不需要的字母的匹配项。
Since you said you figured out first part, here is the second: 由于您说的是第一部分,所以这里是第二部分:
matches = [i for i in matches if not re.search(r'[sznm]', i)]
print(matches)
If you need the t or d
appearing before k or c
, use : [^sznm\\s\\d]*[td][^sznm\\s\\d]*[kc][^sznm\\s\\d]*
. 如果需要在
k or c
之前出现t or d
,请使用: [^sznm\\s\\d]*[td][^sznm\\s\\d]*[kc][^sznm\\s\\d]*
。
[^sznm\\s\\d]
means any character except z, n, m, s
, whitespace characters ( \\s
) or numbers ( \\d
). [^sznm\\s\\d]
表示除z, n, m, s
,空格字符( \\s
)或数字( \\d
)以外的任何字符。
s = "foobar foo".split()
allowed = ({"k", "c"}, {"r", "d"})
forbid = {"s","c","z","m"}
for word in s:
if all(any(k in st for k in word) for st in allowed) and all(k not in forbid for k in word):
print(word)
Or using a list comp with set.intersection: 或使用带有set.intersection的列表组合:
words = [word for word in s if all(st.intersection(word) for st in allowed) and not denied.intersection(word)]
Based on answer of Padraic 根据Padraic的回答
EDIT We both missed this condition 编辑我们都错过了这种情况
[t OR d] AND [k OR c]
[t或d]和[k或c]
So - fixed accordingly 所以-相应地修复
s = "detected dot knight track"
allowed = ({"t","d"},{"k","c"})
forbidden = {"s","z","n", "m"}
for word in s.split():
letter_set = set(word)
if all(letter_set & a for a in allowed) and letter_set - forbidden == letter_set:
print(word)
And the result is 结果是
detected
track
Use this code: 使用此代码:
import re
re.findall('[abcdefghijklopqrtuvwxy]*[td][abcdefghijklopqrtuvwxy]*[kc][abcdefghijklopqrtuvwxy]*', text)
I really like the answer by @padraic-cunningham that does not make use of re, but here is a pattern, which will work: 我真的很喜欢@ padraic-cunningham的答案,该答案没有使用re,但是这是一个可以使用的模式:
pattern = r'(?!\w*[sznm])(?=\w*[td])(?=\w*[kc])\w*'
Positive (?=...)
and negative (?!...)
lookahead assertions are well documented on python.org . 正
(?=...)
和负(?!...)
前瞻断言在python.org上有很好的文档说明 。
You need to use lookarounds. 您需要使用环顾四周。
^(?=.*[td])(?!.*[sznm])\w*[kc]\w*$
ie, 即
>>> l = ['fooktz', 'foocdm', 'foobar', 'kbard']
>>> [i for i in l if re.match(r'^(?=.*[td])(?!.*[sznm])\w*[kc]\w*$', i)]
['kbard']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.