Finding common lines in 2 different files

Question

I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.

#!/usr/bin/python

import sys


def find_common_lines(arg1, arg2, arg3):
    fh1 = open(arg1, 'r+')
    fh2 = open(arg2, 'r+')
    with open(arg3, 'w+') as f:
        for line in fh1 and fh2:
            if line:
                f.write(line)

    fh1.close()
    fh2.close()


number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
    print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
    print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
    arg1 = sys.argv[1]
    arg2 = sys.argv[2]
    arg3 = sys.argv[3]
    find_common_lines(arg1, arg2, arg3)

So, basically what I want this script to do is:

File A

AAB
BBC
DDE
GGC

File B

123
AAB
DDE
345
GHY
GJK

File C

AAB
DDE

Thanks!!!

Answer 1

first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:

for line in fh1 and fh2:
    if line:
        f.write(line)

to

if line in fh1:
    if line in fh2:
        f.write(line)

Answer 2

You can use python's library pandas for this:

Create dataframes for each .txt file like below:

In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)

In [2018]: df_A
Out[2018]: 
     0
0  AAB
1  BBC
2  DDE
3  GGC

In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)

In [2020]: df_B
Out[2020]: 
     0
0  123
1  AAB
2  DDE
3  345
4  GHY
5  GJK

Now, merge both dataframes(like inner join) to find out only common rows between the both.

In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
     0
0  AAB
1  DDE

Then, you can write this output in a file like below:

In [2023]: df_C.to_csv('out.csv', index=False)

This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.

Let me know if this helps.

Answer 3

Try using dictionary:

import sys
def find_common_lines(arg1, arg2, arg3):
    alllines_dict = {}
    with open(arg1, 'r') as f:
        while True:
            line = f.readline()
            if not line:
                break
            alllines_dict[line.strip()] = 1
    with open(arg3, 'w') as out:
        with open(arg2, 'r') as f:
            while True:
                line2 = f.readline()
                if not line2:
                    break
                line2 = line2.strip()
                ispresent = alllines_dict.get(line2, None)
                if ispresent is not None:
                    out.write(line2 + '\n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
    print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
    print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
    arg1 = sys.argv[1]
    arg2 = sys.argv[2]
    arg3 = sys.argv[3]
    find_common_lines(arg1, arg2, arg3)

Finding common lines in 2 different files

Question

3 answers

solution1
1 2018-11-14 18:37:10

solution2
0 2018-11-14 18:37:11

solution3
0 ACCPTED 2018-11-14 18:47:42

Finding common lines in 2 different files

Question

3 answers

solution1 1 2018-11-14 18:37:10

solution2 0 2018-11-14 18:37:11

solution3 0 ACCPTED 2018-11-14 18:47:42

solution1
1 2018-11-14 18:37:10

solution2
0 2018-11-14 18:37:11

solution3
0 ACCPTED 2018-11-14 18:47:42