简体   繁体   中英

Compare lines in two files efficiently in Python

I am trying to compare the two lines and capture the lines that match with each other. For example,

file1.txt contains

my
sure

file2.txt contains

my : 2
mine : 5
sure : 1

and I am trying to output

my : 2
sure : 1

I have the following code so far

inFile = "file1.txt"
dicts = "file2.txt"


with open(inFile) as f:
    content = f.readlines()

content = [x.strip() for x in content]

with open(dicts) as fd:
    inDict = fd.readlines()

inDict = [x.strip() for x in inDict]

ordered_dict = {}

for line in inDict:
    key = line.split(":")[0].strip()
    value = int(line.split(":")[1].strip())
    ordered_dict[key] = value

for (key, val) in ordered_dict.items():
    for entry in content:
        if entry == content:
            print(key, val)
        else:
            continue

However, this is very inefficient because it loops two times and iterates a lot. Therefore, this is not ideal when it comes to large files. How can I make this workable for large files?

You don't need nested loops. One loop to read in file2 and translate to a dict, and another loop to read file1 and look up the results.

inFile = "file1.txt"
dicts = "file2.txt"

ordered_dict = {}
with open(dicts) as fd:
    for line in fd:
        a,b = line.split(' : ')
        ordered_dict[a] = b

with open(inFile) as f:
    for line in f:
        line = line.strip()
        if line in ordered_dict:
            print( line, ":", ordered_dict[line] )

The first loop can be done as a list comprehension.

with open(dicts) as fd:
    ordered_dict = dict( line.strip().split(' : ') for line in fd )

Here is a solution with one for loop:

inFile = "file1.txt"
dicts = "file2.txt"


with open(inFile) as f:
    content_list = list(map(str.split,f.readlines()))

with open(dicts) as fd:
    in_dict_lines = fd.readlines()

for dline in in_dict_lines:
    key,val=dline.split(" : ")
    
    if key in content_list:
        ordered_dict[key] = value

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM