如何查找不包含特定字母的单词？

Question

I'm trying to write a code using regex and my text file.我正在尝试使用正则表达式和我的文本文件编写代码。 My file contains these words line by line:我的文件逐行包含这些词：

nana
abab
nanac
eded

My purpose is: displaying the words which does not contain the letters which are given as substring's letters.我的目的是：显示不包含作为子字符串字母给出的字母的单词。

For example, if my substring is "bn" , my output should be only eded .例如，如果我的子字符串是"bn" ，我的输出应该只是eded 。 Because nana and nanac contains "n" and abab contains "b".因为nana和nanac包含“n”而abab包含“b”。

I have written a code but it only checks first letter of my substring:我写了一个代码，但它只检查我的子字符串的第一个字母：

import re

substring = "bn"
def xstring():
    with open("deneme.txt") as f:
        for line in f:
            for word in re.findall(r'\w+', line):
                for letter in substring:
                    if len(re.findall(letter, word)) == 0:
                        print(word)
                        #yield word
xstring()

How do I solve this problem?我该如何解决这个问题？

Answer 1

Here, we would just want to have a simple expression such as:在这里，我们只想有一个简单的表达式，例如：

^[^bn]+$

We are adding b and n in a not-char class [^bn] and collecting all other chars, then by adding ^ and $ anchors we will be failing all strings that might have b and n .我们在非字符类[^bn]中添加b和n并收集所有其他字符，然后通过添加^和$锚点，我们将使所有可能具有b和n字符串失败。

Demo演示

Test测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^[^bn]+$"

test_str = ("nana\n"
    "abab\n"
    "nanac\n"
    "eded")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx正则表达式

If this expression wasn't desired, it can be modified/changed in regex101.com .如果不需要此表达式，则可以在regex101.com 中对其进行修改/更改。

RegEx Circuit正则表达式电路

jex.im visualizes regular expressions: jex.im可视化正则表达式：

Answer 2

@Xosrov has the right approach, with a few minor issues and typos. @Xosrov 有正确的方法，但有一些小问题和拼写错误。 The below version of the same logic works相同逻辑的以下版本有效

import re

def xstring(substring, words):
    regex = re.compile('[%s]' % ''.join(sorted(set(substring))))
    # Excluding words matching regex.pattern
    for word in words:
        if not re.search(regex, word):
            print(word)

words = [
    'nana',
    'abab',
    'nanac',
    'eded',
]

xstring("bn", words)

Answer 3

If you want to check if a string has a set of letters, use brackets.如果要检查字符串是否包含一组字母，请使用方括号。
For example using [bn] will match words that contain one of those letters.例如，使用[bn]将匹配包含这些字母之一的单词。

import re
substring = "bn"
regex = re.compile('[' + substring + ']')
def xstring():
    with open("dename.txt") as f:
        for line in f:
            if(re.search(regex, line) is None):
                print(line)
xstring()

Answer 4

It might not be the most efficient but you could try doing something with set intersections the following code segment will print the the value in the string word only if it does not contain any of the letters 'b' or 'n'它可能不是最有效的，但您可以尝试使用设置交集执行某些操作，以下代码段仅在字符串 word 中不包含任何字母 'b' 或 'n' 时才会打印该值

if (not any(set(word) & set('bn'))):
        print(word)

如何查找不包含特定字母的单词？

问题描述

4 个解决方案

解决方案1
4 2019-06-01 21:27:45

Demo演示

Test测试

RegEx正则表达式

RegEx Circuit正则表达式电路

解决方案2
2 已采纳 2019-06-01 21:26:10

解决方案3
0 2019-06-01 21:13:18

解决方案4
-1 2019-06-01 21:24:35

如何查找不包含特定字母的单词？

问题描述

4 个解决方案

解决方案1 4 2019-06-01 21:27:45

Demo演示

Test测试

RegEx正则表达式

RegEx Circuit正则表达式电路

解决方案2 2 已采纳 2019-06-01 21:26:10

解决方案3 0 2019-06-01 21:13:18

解决方案4 -1 2019-06-01 21:24:35

解决方案1
4 2019-06-01 21:27:45

解决方案2
2 已采纳 2019-06-01 21:26:10

解决方案3
0 2019-06-01 21:13:18

解决方案4
-1 2019-06-01 21:24:35