简体   繁体   English

使用python从一个文本文件中查找短语在另一个文本文件中

[英]Find phrases from one text file in another text file with python

I have one file which is a list of phrases, one phrase on each line. 我有一个文件是短语列表,每行一个短语。 The other file is not delimitated in any way, it's just one huge text file of words. 另一个文件没有任何定界,只是一个巨大的单词文本文件。 I want to search for the phrases in the second file and if they are found, to print the phrase. 我想在第二个文件中搜索短语,如果找到它们,则打印该短语。 This is the code I have so far. 这是我到目前为止的代码。

f = open("phrase.txt", "r")
g = open("text.txt", "r")

for line in f:
    search=line.lower()


for word in g:
    if search in word:
        print(search)

This is not printing anything for me, though. 不过,这并没有为我打印任何内容。

EDIT: I changed the code to this: 编辑:我将代码更改为此:

f = open('phrase.txt').readlines()
f = [f.strip('\n').lower() for f in f]
g = open('text.txt').read()
for phrase in f:
    if phrase in g:
        print (phrase)

now i get the phrases that match. 现在我得到了匹配的短语。 however some of the phrases have dashes (-) and more letters after them and they are not picked up by the program even if the phrase before the dash is present in text.txt. 但是,某些词组后面带有破折号(-)和更多字母,即使text.txt中包含破折号之前的短语,程序也不会选择它们。 any way to change this? 有什么办法改变吗?

If you want to search for every phrase in the file, you would have to nest the loops, currently, you are just searching for the last phrase 如果要搜索文件中的每个短语,则必须嵌套循环,当前,您只是在搜索最后一个短语

phrases = open("phrase.txt").readLines()

for phrase in phrases:
    search= phrase.lower()
    words = open("text.txt", "r")
    for word in words:
        if search in word:
            print(search)
    words.close()

However, now things start to look funny, because you are asking if a phrase is in a word, which doesn't seem right. 但是,现在事情开始看起来很有趣,因为您要问一个词是否在单词中,这似乎不太对劲。 So 所以

phrases = open("phrase.txt").readLines()
words = open("text.txt").read()

for phrase in phrases:
    all_words_found = True
    phrase_words = phrase.lower().split(" ")
    for word in phrase_words:
        if word not in words:
            all_words_found = False
            break

    if all_words_found:
        print phrase

This is what you want I do believe 这就是我想要的

f = open('phrase.txt').readlines()
f = [f.strip('\n').lower() for f in f]
g = open('text.txt').read()
words = g.split()

for phrase in f:
    search_words = phrase.split()
    for word in search_words:
        if word in words:
            print phrase

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM