[英]Searching a text file for every word in a list and print the lines
I would like to search a .txt file for a "list" of words and print any line in the txt that contains any words in the wordlist. 我想在.txt文件中搜索单词的“列表”,并在txt中打印包含单词列表中任何单词的任何行。
I firstly used .split()
to split out the raw_input
(called userInput
) and got a wordlist. 我首先使用.split()
来拆分raw_input
(称为userInput
)并得到一个wordlist。 After that I filtered the current wordlist with another blacklist wordlist and got a final filtered wordlist. 之后,我用另一个黑名单wordlist过滤了当前的wordlist,得到了最终过滤的wordlist。 I want to search the text file for any of its words in this case. 在这种情况下,我想在文本文件中搜索任何单词。
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
while True:
userInput = raw_input("> ")
uqWords = userInput.split()
fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
After I took userInput
apart with .split()
and called it uqWords
I filtered them from any words in the exWords
list and called the output fqWords
. 在我用.split()
分开userInput
并将其称为uqWords
我从exWords
列表中的任何单词中过滤掉它们并调用输出fqWords
。 Now I want to search Database.txt
for any word in the fqWords
list and print the lines. 现在我想在Database.txt
中搜索fqWords
列表中的任何单词并打印行。
to be specified; 指定; my full code is: 我的完整代码是:
import time
import random
Error = ["Sorry, I don't understand.", "I don't get it"]
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
R = "Rel > "
while True:
userInput = raw_input("> ")
uqWords = userInput.split()
fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
DB = open("Database.txt")
for line in DB:
if fqWords in line:
print (R + line[:-1])
CDB = open("CodeDB.txt")
for code in CDB:
if fqWords in code:
print (R + code[:-1])
break
if fqWords not in (code and line):
randomError = random.choice(Error)
print (R + (randomError))
Try using this function: 尝试使用此功能:
def search_for_lines(filename, words_list):
words_found = 0
with open(filename) as db_file:
for line_no, line in enumerate(db_file):
if any(word in line for word in words_list):
print(line_no, ':', line)
words_found += 1
return words_found
Just pass the filename and the list of words you want to search and it will print the line number, together with the line content, and will return how many lines were found with any of the words. 只需传递您要搜索的文件名和单词列表,它就会打印行号以及行内容,并返回在任何单词中找到的行数。 enumerate will give you tuples of the line number and the line itself as the file iterates over every line. 当文件遍历每一行时, enumerate将为您提供行号和行本身的元组。
To add this to your existing code and search thought both files, you will need to first declare it, and then call it just after your assignment of fqWords
like so: 要将它添加到现有代码并搜索两个文件,您需要首先声明它,然后在分配fqWords
之后调用它, fqWords
所示:
import random
def search_for_lines(filename, words_list):
words_found = 0
with open(filename) as db_file:
for line_no, line in enumerate(db_file):
if any(word in line for word in words_list):
print(line_no, ':', line)
words_found += 1
return words_found
Error = ["Sorry, I don't understand.", "I don't get it"]
exWords = ['Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!']
R = "Rel > "
while True:
userInput = raw_input("> ")
uqWords = userInput.split()
fqWords = [word for word in uqWords if not any(bad in word for bad in exWords)]
search_for_lines("Database.txt", fqWords)
words_found = search_for_lines("CodeDB.txt", fqWords)
if words_found > 0:
break
else:
randomError = random.choice(Error)
print (R + (randomError))
If you don't need to modify a list, use tuple
. 如果您不需要修改列表,请使用tuple
。 And for naming identifiers see PEP 8 . 对于命名标识符,请参阅PEP 8 。
To get difference of sequences, use set
, fe {1,2,3} - {2,3}
is {1}
. 为了得到序列的差异,使用set
,fe {1,2,3} - {2,3}
是{1}
。
If you open
same files within a loop, it get opened in every iteration, so better move them out of the loop. 如果在循环中open
相同的文件,它会在每次迭代中打开,因此最好将它们移出循环。
import random
def get_line_with_words(lines, words):
"""returns list of lines if any of the words
in any of the lines
"""
return [(i, line.strip()) for i, line in enumerate(lines,1) if any(word in line for word in words)]
errors = ("Sorry, I don't understand.", "I don't get it")
ex_words = ('Who', 'How', 'What', 'How many', 'How much', 'am', 'is', 'are', '?', '!')
prefix = "Rel > "
with open("Database.txt") as db, open("CodeDB.txt") as cdb:
while True:
user_input = raw_input("> ")
uq_words = user_input.split()
fq_words = frozenset(uq_words) - frozenset(ex_words)
res1 = get_line_with_words(db, fq_words)
res2 = get_line_with_words(cdb, fq_words)
if res1 and res2:
for n, line in res1 + res2:
print('{} {} {}'.format(prefix, n, line)
break
print('{} {}'.format(prefix, random.choice(errors)))
db.seek(0)
cdb.seek(0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.