[英]How to check if a txt file exists a keyword in another txt file?
我有一个输入 txtfile,比如
The quick brown fox jumps over the lazy dog
The quick brown fox
A beautiful dog
我将关键字保存为 txt 文件,例如,
fox dog ...
我想检查输入文件的每一行是否有这些关键字,我知道如何一个一个地检查关键字,
with open("input.txt") as f:
a_file = f.read().splitlines()
b_file = []
for line in a_file:
if "dog" in line:
b_file.append("dog")
elif "fox" in line:
b_file.append("fox")
else:
b_file.append("Not found")
with open('output.txt', 'w') as f:
f.write('\n'.join(b_file) + '\n')
但是如何检查它们是否在另一个文件中? PS 我需要检查一些特定的行,而不是文件中的所有内容,例如,结果应该是,
fox dog
fox
dog
您应该加载这两个文件。 一个是关键字查询,另一个是搜索内容。 例如,我有一个名为keywords.txt
和content.txt
的文件,然后将其全部打开:
with open("keywords.txt") as f1, open("content.txt") as f2:
keywords = f1.read()
content = f2.read()
# keywords: fox dog
# content: The quick brown fox jumps over the lazy dog\nThe quick brown fox\nA beautiful dog
如果您只想检查内容是否包含关键字,那么只需执行以下操作:
keywords = [line.split() for line in keywords.split("\n")]
keywords = sum(keywords, [])
# keywords: ['fox', 'dog']
content = [line.split() for line in content.split("\n")]
content = sum(content, [])
# content: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog', 'The', 'quick', 'brown', 'fox', 'A', 'beautiful', 'dog']
# check intersection of 2 sets, if there is some words overlap
# ==> keywords appear in the content
if set(keywords)&set(content):
print(True)
else:
print(False)
尽管您更改了一些要求,但您似乎想要这样:
这个脚本是这样做的:
with open('keywords.txt') as f:
keywords = f.read().split()
with open('document.txt') as f, open('output.txt', 'w') as o:
for n, line in enumerate(f):
if matches := [k for k in keywords if k in line]:
o.write(f'{n+1}: {matches}\n')
使用keywords.txt
类似:
fox dog
和document.txt
是这样的:
the quick brown fox
jumped over the lazy dog
on a beautiful dog day afternoon, you foxy dog
there is nothing on FOX
and sometimes you're in a foxhole with a dog
它将写入output.txt
:
1: ['fox']
2: ['dog']
3: ['fox', 'dog']
5: ['fox', 'dog']
如果您不想要部分匹配(如foxhole
)并且您关心找到单词的顺序,并且可能还想了解重复项,并且您想要确保大小写无关紧要:
with open('keywords.txt') as f:
keywords = [k.lower() for k in f.read().split()]
with open('document.txt') as f, open('output.txt', 'w') as o:
for n, line in enumerate(f):
if matches := [w for w in line.split() if w.lower() in keywords]:
o.write(f'{n+1}: {matches}\n')
最后,也许您的 document.txt 在第 6 行带有标点符号:
I watch "FOX", but although I search doggedly, I can't find a thing, you foxy dog!
然后这个脚本:
import re
import string
with open('keywords.txt') as f:
keywords = [k.lower() for k in f.read().split()]
with open('document.txt') as f, open('output.txt', 'w') as o:
for n, line in enumerate(f):
if matches := [w for w in re.sub('['+string.punctuation+']', '', line).split() if w.lower() in keywords]:
o.write(f'{n+1}: {matches}\n')
将此写入output.txt
:
1: ['fox']
2: ['dog']
3: ['dog', 'dog']
4: ['FOX']
5: ['dog']
6: ['FOX', 'dog']
对于所有不熟悉 Python 的人,我想通过两个目标扩展Grismar 的多种答案:
[expr for var in generator]
是用于构建列表的列表理解i, var in enumerate(list)
使用enumerate
在循环内有索引和迭代器变量var:= expr
是Python 3.8引入的海象运算符(赋值表达式) Enum
(类)定义了 3 种建议的匹配模式。 然后我们可以对两者使用此模式:
keywords_from
match_keywords
查找这些关键字的匹配项from enum import Enum
class KeywordMatch(Enum):
EXACT = 'exact'
LOWER = 'lower'
PARTIAL = 'partial'
# Usage: keywords = keywords_from('keywords.txt', KeywordMatch.LOWER)
def keywords_from(filename, mode):
with open(filename) as f:
if mode == KeywordMatch.LOWER:
keywords = [k.lower() for k in f.read().split()]
else:
keywords = f.read().split()
return keywords
import re
import string
# Usage: if match_keywords(line, KeywordMatch.LOWER):
def match_keywords(line, mode):
if mode == KeywordMatch.LOWER
matches = [w for w in line.split() if w.lower() in keywords]
elif mode == KeywordMatch.PARTIAL:
matches = [w for w in re.sub('['+string.punctuation+']', '', line).split() if w.lower() in keywords]
else:
matches = [k for k in keywords if k in line]
return matches
if __name__ == "__main__":
mode = KeywordMatch.LOWER
keywords = keywords_from('keywords.txt', mode)
with open('document.txt') as f, open('output.txt', 'w') as o:
for n, line in enumerate(f):
matches = match_keywords(line, mode)
# can also test or debug-print matches
if matches:
o.write(f'{n+1}: {matches}\n')
笔记:
keywords
列表仍然是一个全局变量(不是那么干净)matches
项分开,以便在写入文件之前测试或调试它们
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.