简体   繁体   English

正则表达式:查找所有带有某些字母但没有其他字母的单词

[英]regex: find all words with certain letters but not other

can anyone is help me with that: 谁能帮助我:

I need to find all words from list containing letters [t OR d] AND [k OR c] but not any of [s,z,n,m] 我需要从包含字母[t OR d]和[k OR c]的列表中查找所有单词,但不包含[s,z,n,m]中的任何一个

I figured out first part, but don't know how to include stop list: 我已经弄清楚了第一部分,但是不知道如何包括停止列表:

\w*[t|d]\w*[k|c]\w*

in Python notation 用Python表示法

Thank you in advance 先感谢您

You can use 2 steps. 您可以使用2个步骤。 First find t|d AND k|c, then filter out matches with unwanted letters. 首先找到t | d AND k | c,然后过滤掉不需要的字母的匹配项。

Since you said you figured out first part, here is the second: 由于您说的是第一部分,所以这里是第二部分:

matches = [i for i in matches if not re.search(r'[sznm]', i)]    
print(matches) 

If you need the t or d appearing before k or c , use : [^sznm\\s\\d]*[td][^sznm\\s\\d]*[kc][^sznm\\s\\d]* . 如果需要在k or c之前出现t or d ,请使用: [^sznm\\s\\d]*[td][^sznm\\s\\d]*[kc][^sznm\\s\\d]*

[^sznm\\s\\d] means any character except z, n, m, s , whitespace characters ( \\s ) or numbers ( \\d ). [^sznm\\s\\d]表示除z, n, m, s ,空格字符( \\s )或数字( \\d )以外的任何字符。

s = "foobar foo".split()

allowed = ({"k", "c"}, {"r", "d"})
forbid = {"s","c","z","m"}

for word in s:
    if all(any(k in st for k in word) for st in allowed) and all(k not in forbid for k in word):
        print(word)

Or using a list comp with set.intersection: 或使用带有set.intersection的列表组合:

words = [word for word in s if all(st.intersection(word) for st in allowed) and not denied.intersection(word)]

Based on answer of Padraic 根据Padraic的回答

EDIT We both missed this condition 编辑我们都错过了这种情况

[t OR d] AND [k OR c] [t或d]和[k或c]

So - fixed accordingly 所以-相应地修复

s = "detected dot knight track"

allowed = ({"t","d"},{"k","c"})
forbidden = {"s","z","n", "m"}

for word in s.split():
    letter_set = set(word)
    if all(letter_set & a for a in allowed) and letter_set - forbidden == letter_set:
        print(word)

And the result is 结果是

detected
track

Use this code: 使用此代码:

import re
re.findall('[abcdefghijklopqrtuvwxy]*[td][abcdefghijklopqrtuvwxy]*[kc][abcdefghijklopqrtuvwxy]*', text)

I really like the answer by @padraic-cunningham that does not make use of re, but here is a pattern, which will work: 我真的很喜欢@ padraic-cunningham的答案,该答案没有使用re,但是这是一个可以使用的模式:

pattern = r'(?!\w*[sznm])(?=\w*[td])(?=\w*[kc])\w*'

Positive (?=...) and negative (?!...) lookahead assertions are well documented on python.org . (?=...)和负(?!...)前瞻断言在python.org上有很好的文档说明

You need to use lookarounds. 您需要使用环顾四周。

^(?=.*[td])(?!.*[sznm])\w*[kc]\w*$

ie,

>>> l = ['fooktz', 'foocdm', 'foobar', 'kbard']
>>> [i for i in l if re.match(r'^(?=.*[td])(?!.*[sznm])\w*[kc]\w*$', i)]
['kbard']

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 用多字符字母查找特定字母表中的所有单词 - find all words in a certain alphabet with multi character letters 快速查找单词列表是否包含至少一个以某些字母开头的单词(不是“查找所有单词”!) - Fast way to find if list of words contains at least one word that starts with certain letters (not "find ALL words"!) 在 Python 中使用正则表达式查找具有某些字符和不包含其他字符的单词 - Using Regex in Python to find words with certain characters and without other characters 使用正则表达式查找不是在句子开头的大写字母 - Find words with capital letters not at start of a sentence with regex 正则表达式查找带有数字和字母的单词(哈希) - Regex to find words (hash) with numbers and letters 正则表达式获取包含字母和(数字/某些特殊)的“单词”,但不仅仅是数字 - regex to get “words” containing letters and (numbers/certain special), but not only numbers 查找以字母“it”结尾的单词......同时否定所有其他单词? - Finding words ending in letters "it" ... while negating all other words? Regex , 找到句子,都是大写字母 - Regex , Find the sentence, all of which are capital letters 如何在for循环中匹配以特定字母序列开头的所有单词? (Python) - How to match all words starting with a certain sequence of letters in a for loop? (Python) 正则表达式查找以大写字母开头的单词,而不是在句子的开头 - Regex to find words starting with capital letters not at beginning of sentence
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM