简体   繁体   English

在 Python 中有效地比较两个文件中的行

[英]Compare lines in two files efficiently in Python

I am trying to compare the two lines and capture the lines that match with each other.我正在尝试比较两条线并捕获彼此匹配的线。 For example,例如,

file1.txt contains file1.txt 包含

my
sure

file2.txt contains file2.txt 包含

my : 2
mine : 5
sure : 1

and I am trying to output我正在尝试 output

my : 2
sure : 1

I have the following code so far到目前为止我有以下代码

inFile = "file1.txt"
dicts = "file2.txt"


with open(inFile) as f:
    content = f.readlines()

content = [x.strip() for x in content]

with open(dicts) as fd:
    inDict = fd.readlines()

inDict = [x.strip() for x in inDict]

ordered_dict = {}

for line in inDict:
    key = line.split(":")[0].strip()
    value = int(line.split(":")[1].strip())
    ordered_dict[key] = value

for (key, val) in ordered_dict.items():
    for entry in content:
        if entry == content:
            print(key, val)
        else:
            continue

However, this is very inefficient because it loops two times and iterates a lot.然而,这是非常低效的,因为它循环了两次并且迭代了很多次。 Therefore, this is not ideal when it comes to large files.因此,对于大文件,这并不理想。 How can I make this workable for large files?我怎样才能使它适用于大文件?

You don't need nested loops.您不需要嵌套循环。 One loop to read in file2 and translate to a dict, and another loop to read file1 and look up the results.一个循环读取 file2 并转换为 dict,另一个循环读取 file1 并查找结果。

inFile = "file1.txt"
dicts = "file2.txt"

ordered_dict = {}
with open(dicts) as fd:
    for line in fd:
        a,b = line.split(' : ')
        ordered_dict[a] = b

with open(inFile) as f:
    for line in f:
        line = line.strip()
        if line in ordered_dict:
            print( line, ":", ordered_dict[line] )

The first loop can be done as a list comprehension.第一个循环可以作为列表理解来完成。

with open(dicts) as fd:
    ordered_dict = dict( line.strip().split(' : ') for line in fd )

Here is a solution with one for loop:这是一个带有一个 for 循环的解决方案:

inFile = "file1.txt"
dicts = "file2.txt"


with open(inFile) as f:
    content_list = list(map(str.split,f.readlines()))

with open(dicts) as fd:
    in_dict_lines = fd.readlines()

for dline in in_dict_lines:
    key,val=dline.split(" : ")
    
    if key in content_list:
        ordered_dict[key] = value

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM