简体   繁体   中英

How to Find Words Not Containing Specific Letters?

I'm trying to write a code using regex and my text file. My file contains these words line by line:

nana
abab
nanac
eded

My purpose is: displaying the words which does not contain the letters which are given as substring's letters.

For example, if my substring is "bn" , my output should be only eded . Because nana and nanac contains "n" and abab contains "b".

I have written a code but it only checks first letter of my substring:

import re

substring = "bn"
def xstring():
    with open("deneme.txt") as f:
        for line in f:
            for word in re.findall(r'\w+', line):
                for letter in substring:
                    if len(re.findall(letter, word)) == 0:
                        print(word)
                        #yield word
xstring()

How do I solve this problem?

Here, we would just want to have a simple expression such as:

^[^bn]+$

We are adding b and n in a not-char class [^bn] and collecting all other chars, then by adding ^ and $ anchors we will be failing all strings that might have b and n .

Demo

Test

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"^[^bn]+$"

test_str = ("nana\n"
    "abab\n"
    "nanac\n"
    "eded")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):
    
    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))
    
    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1
        
        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.  

RegEx

If this expression wasn't desired, it can be modified/changed in regex101.com .

RegEx Circuit

jex.im visualizes regular expressions:

在此处输入图片说明

@Xosrov has the right approach, with a few minor issues and typos. The below version of the same logic works

import re

def xstring(substring, words):
    regex = re.compile('[%s]' % ''.join(sorted(set(substring))))
    # Excluding words matching regex.pattern
    for word in words:
        if not re.search(regex, word):
            print(word)

words = [
    'nana',
    'abab',
    'nanac',
    'eded',
]

xstring("bn", words)

If you want to check if a string has a set of letters, use brackets.
For example using [bn] will match words that contain one of those letters.

import re
substring = "bn"
regex = re.compile('[' + substring + ']')
def xstring():
    with open("dename.txt") as f:
        for line in f:
            if(re.search(regex, line) is None):
                print(line)
xstring()

It might not be the most efficient but you could try doing something with set intersections the following code segment will print the the value in the string word only if it does not contain any of the letters 'b' or 'n'

if (not any(set(word) & set('bn'))):
        print(word)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM