简体   繁体   English

在一个文件中使用文本在第二个文件中搜索匹配项

[英]Using text in one file to search for match in second file

I'm using python 2.6 on linux. 我在Linux上使用python 2.6。

I have two text files first.txt has a single string of text on each line. 我有两个文本文件first.txt每行只有一个文本字符串。 So it looks like 所以看起来

lorem LOREM
ipus 议会联盟
asfd ASFD

The second file doesn't quite have the same format. 第二个文件格式不完全相同。 it would look more like this 它看起来像这样

1231 lorem 1231 lorem
1311 assss 31 1 1311屁股31 1
etc 等等

I want to take each line of text from first.txt and determine if there's a match in the second text. 我想从first.txt中获取每一行文本,并确定第二个文本中是否有匹配项。 If there isn't a match then I would like to save the missing text to a third file. 如果没有匹配项,那么我想将丢失的文本保存到第三个文件中。 I would like to ignore case but not completely necessary. 我想忽略情况,但并非完全必要。 This is why I was looking at regex but didn't have much luck. 这就是为什么我一直看正则表达式但运气不佳的原因。

So I'm opening the files, using readlines() to create a list. 所以我打开文件,使用readlines()创建一个列表。
Iterating through the lists and printing out the matches. 遍历列表并打印出匹配项。

Here's my code 这是我的代码

first_file=open('first.txt', "r")
first=first_file.readlines()
first_file.close()

second_file=open('second.txt',"r")
second=second_file.readlines()
second_file.close()

while i < len(first):
  j=search[i]
  while k < len(second):
   m=compare[k]
   if not j.find(m):
    print m
   i=i+1
   k=k+1
exit() 

It's definitely not elegant. 绝对不优雅。 Anyone have suggestions how to fix this or a better solution? 任何人都建议如何解决此问题或更好的解决方案?

My approach is this: Read the second file, convert it into lowercase and then create a list of the words it contains. 我的方法是:读取第二个文件,将其转换为小写,然后创建包含它的单词的列表。 Then convert this list into a set , for better performance with large files. 然后将此列表转换为set ,以提高大文件的性能。

Then go through each line in the first file, and if it (also converted to lowercase, and with extra whitespace removed) is not in the set we created, write it to the third file. 然后遍历第一个文件中的每一行,如果它(也转换为小写,并去除了多余的空格)不在我们创建的集合中,请将其写入第三个文件。

with open("second.txt") as second_file:
    second_values = set(second_file.read().lower().split())

with open("first.txt") as first_file:
    with open("third.txt", "wt") as third_file:
        for line in first_file:
            if line.lower().strip() not in second_values:
                third_file.write(line + "\n")

set objects are a simple container type that is unordered and cannot contain duplicate value. 集合对象是一种无序的简单容器类型,不能包含重复值。 It is designed to allow you to quickly add or remove items, or tell if an item is already in the set. 它旨在允许您快速添加或删除项目,或判断项目集中是否已存在项目。

with statements are a convenient way to ensure that a file is closed, even if an exception occurs. with语句是确保关闭文件的便捷方法,即使发生异常也是如此。 They are enabled by default from Python 2.6 onwards, in Python 2.5 they require that you put the line from __future__ import with_statements at the top of your file. 从Python 2.6起默认启用它们,在Python 2.5中要求将from __future__ import with_statements中的行放在文件顶部。

The in operator does what it sounds like: tell you if a value can be found in a collection. in运算符听起来很像:告诉您是否可以在集合中找到一个值。 When used with a list it just iterates through, like your code does, but when used with a set object it uses hashes to perform much faster. 当与列表一起使用时,它就像代码一样进行迭代,但是当与集合对象一起使用时,它使用哈希来提高执行速度。 not in does the opposite. not in则相反。 (Possible point of confusion: in is also used when defining a for loop ( for x in [1, 2, 3] ), but this is unrelated.) (可能的混淆点: in定义for循环( for x in [1, 2, 3]时也使用for x in [1, 2, 3] ,但这无关紧要。)

Assuming that you're looking for the entire line in the second file: 假设您要在第二个文件中查找整行:

second_file=open('second.txt',"r")
second=second_file.readlines()
second_file.close()


first_file=open('first.txt', "r")
for line in first_file:
    if line not in second:
        print line

first_file.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用python搜索一个文件的内容和第二个文件的内容 - search contents of one file with contents of a second file using python 将一个文件的内容与第二个文件的内容进行匹配 - Match contents of one file with the contents of a second file 使用第二个文本文件作为参考Python在文本文件中查找完全匹配 - Find exact match in text file using a second text file as reference Python Python在一个文本文件中搜索值,将它们与另一个文本文件中的值进行比较,然后在匹配时替换值 - Python to search values in one text file, compare them with values in another text file, then replace values if there is a match 如何使用Python搜索和替换文本从一个文件到另一个文件? - How to search and replace text from one file to another using Python? 使用二进制搜索和递归搜索文本文件 - Using binary search and recursion to search a text file 使用pynacl加密一个文件并解密另一个文件 - Using pynacl to encrypt with one file and decrypt with a second file 从另一个文本文件中搜索一个文件中列出的字符串? - Search for strings listed in one file from another text file? 使用文本列表搜索文件名 - Search for file names using a text list 使用python搜索极大的文本文件 - using python to search extremely large text file
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM