在 Python 中使用正则表达式查找具有某些字符和不包含其他字符的单词

Question

First off, I am new to regex and am using https://regex101.com/r/arkkVE/3 to help me learn it.首先，我是正则表达式的新手，我正在使用https://regex101.com/r/arkkVE/3来帮助我学习它。

I'd like to find words from a.txt file that I have using re.我想从我使用 re 的 a.txt 文件中查找单词。 So far I am able to do this, but it is very verbose and I am trying to cut back on repeated sequences of regex expressions.到目前为止，我能够做到这一点，但它非常冗长，我正试图减少正则表达式的重复序列。

currently this is what I have目前这就是我所拥有的

Possibility = list()
with open('5LetterWords.txt') as f:
    for line in f.readlines():
        Possibility += re.findall(r'(?=\w)(?=.*[@#t])[\w]+(?=\w)(?=.*[@#o])[\w]+(?=\w)(?=.*[@#u])[\w]+'
        , line)
    print(Possibility)

This finds words that have the letters "t" and "o" and "u" in no particular order, which is the first step in what I want.这会找到没有特定顺序的字母“t”、“o”和“u”的单词，这是我想要的第一步。

I want to add additional regex expressions that will omit words that have other characters, but I don't know how to exclude using regex.我想添加其他正则表达式，这些表达式将省略具有其他字符的单词，但我不知道如何使用正则表达式进行排除。

As you can see this is starting to get really long and ugly.正如你所看到的，这开始变得非常长而且丑陋。

Should I be using regex?我应该使用正则表达式吗？ Is there a better/more concise way to solve this problem?有没有更好/更简洁的方法来解决这个问题？

Thanks谢谢

Answer 1

Ideally you would read the file line by line and check each word for the existence of t , o , and u and additionally check that a does not exist.理想情况下，您将逐行读取文件并检查每个单词是否存在t 、 o和u并另外检查a是否存在。

I'm not a Python dev but this seems relevant: https://stackoverflow.com/a/5189069/2191572我不是 Python 开发人员，但这似乎相关： https://stackoverflow.com/a/5189069/2191572

if ('t' in word) and ('o' in word) and ('u' in word) and ('a' not in word):
    print('yay')
else:
    print('nay')

If you insist on regex, then this would work:如果您坚持使用正则表达式，那么这将起作用：

^(?=.*t)(?=.*o)(?=.*u)(?!.*a).*$

^ - start line anchor ^ - 起始线锚点
(?=.*t) - ahead of me there exists a t (?=.*t) - 在我前面有一个t
(?=.*o) - ahead of me there exists a o (?=.*o) - 在我前面有一个o
(?=.*u) - ahead of me there exists a u (?=.*u) - 在我面前有一个u
(?..*a) - ahead of me are no a s (?..*a) - 在我前面没有a s
.* - capture everything .* - 捕捉一切
$ - end line anchor $ - 结束线锚

Note: (?..*a).* can be substituted with [^a]*注意： (?..*a).*可以替换为[^a]*

https://regex101.com/r/WtVr8S/1 https://regex101.com/r/WtVr8S/1

Answer 2

I guess you could iterate through your list of words and filter out which word you want or don't want, for example我想您可以遍历您的单词列表并过滤掉您想要或不想要的单词，例如

words = ['about', 'alout', 'aotus', 'apout', 'artou', 'atour', 'blout', 'bottu', 'bouet', 'boult', 'bouto', 'bouts', 'chout', 'clout', 'count', 'court', 'couth', 'crout', 'donut', 'doubt', 'flout', 'fotui', 'fount', 'foute', 'fouth', 'fouty', 'glout', 'gouty', 'gouts', 'grout', 'hoult', 'yourt', 'youth', 'joust', 'keout', 'knout', 'lotus', 'louty', 'louts', 'montu', 'moult', 'mount', 'mouth', 'nobut', 'notum', 'notus', 'plout', 'pluto', 'potus', 'poult', 'pouty', 'pouts', 'roust', 'route', 'routh', 'routs', 'scout', 'shout', 'skout', 'smout', 'snout', 'south', 'spout', 'stoun', 'stoup', 'stour', 'stout', 'tatou', 'taupo', 'thous', 'throu', 'thuoc', 'todus', 'tofus', 'togue', 'tolus', 'tonus', 'topau', 'toque', 'torus', 'totum', 'touch', 'tough', 'tould', 'tourn', 'tours', 'tourt', 'touse', 'tousy', 'toust', 'touts', 'troue', 'trout', 'trouv', 'tsubo', 'voust']
result = []
for word in words:
    if ('a' in word) or ('y' in word):
        continue    #to skip
    elif ('t' in word) or ('u' in word) or ('o' in word):
        result.append(word)

在 Python 中使用正则表达式查找具有某些字符和不包含其他字符的单词

问题描述

2 个解决方案

解决方案1
0 2022-08-05 14:29:34

解决方案2
0 2022-08-05 14:32:47

在 Python 中使用正则表达式查找具有某些字符和不包含其他字符的单词

问题描述

2 个解决方案

解决方案1 0 2022-08-05 14:29:34

解决方案2 0 2022-08-05 14:32:47

解决方案1
0 2022-08-05 14:29:34

解决方案2
0 2022-08-05 14:32:47