简体   繁体   中英

How to search for words containg certain letters in a txt file with Python?

Look at the code below. This finds the letter 'b' containing in the text file and prints all the words containing the letter 'b' right?

x = open("text file", "r")
for line in x:
    if "b" and in line: print line

searchfile.close()

Now here is my problem. I would like to search with not only one, but several letters. Like, a and b both has to be in the same word. And then print the list of words containing both letters.

And I'd like to have the user decide what the letters should be.

How do I do that?


Now I've come up with something new. After reading an answer.

x = open("text file", "r")

for line in x: if "b" in line and "c" in line and "r" in line: print line

Would this work instead? And how do I make the user enter the letters?

No, your code (apart from the fact that it's syntactically incorrect), will print every line that has a "b" , not the words.

In order to do what you want to do, we need more information about the text file. Suppossing words are separated by single spaces, you could do something like this

x = open("file", "r")
words = [w for w in x.read().split() if "a" in w or "b" in w]

You could use sets for this:

letters = set(('l','e'))
for line in open('file'):
  if letters <= set(line):
    print line

In the above, letters <= set(line) tests whether every element of letters is present in the set consisting of the unique letters of line .

x = open("text file", "r")
letters = raw_input('Enter the letters to match') # "ro" would match "copper" and "word"
letters = letters.lower()
for line in x:
    for word in line.split()
        if all(l in word.lower() for l in letters): # could optimize with sets if needed
            print word

First you need to split the contents of the file into a list of words. To do this you need to split it on line-breaks and on spaces, possibly hypens too, I don't really know. You might want to use re.split depending on how complicated the requirements are. But for this examples lets just go:

words = []

with open('file.txt', 'r') as f:
  for line in f:
    words += line.split(' ')

Now it will help efficiency if we only have to scan words once and presumably you only want a word to appear once in the final list anyway, so we cast this list as a set

words = set(words)

Then to get only those selected_words containing all of the letters in some other iterable letters :

selected_words = [word for word in words if 
  [letter for letter in letters if letter in word] == letters]

I think that should work. Any thoughts on efficiency? I don't know the details of how those list comprehensions run.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM