I am trying to find common lines in 2 different files and trying to list them in a new text file. I wrote this below but it does not find the commons, only writes whatever the file I gave in the arg2. Please help me to troubleshoot.
#!/usr/bin/python
import sys
def find_common_lines(arg1, arg2, arg3):
fh1 = open(arg1, 'r+')
fh2 = open(arg2, 'r+')
with open(arg3, 'w+') as f:
for line in fh1 and fh2:
if line:
f.write(line)
fh1.close()
fh2.close()
number_of_arguments = len(sys.argv) - 1
if number_of_arguments < 3:
print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
So, basically what I want this script to do is:
File A
AAB
BBC
DDE
GGC
File B
123
AAB
DDE
345
GHY
GJK
File C
AAB
DDE
Thanks!!!
first of all, you need to give 2 logical statements when using the "and" operator, right now you are using 1 logical statement and then directly feeding fh2 in the for loop. Try changing the code to something along these lines:
for line in fh1 and fh2:
if line:
f.write(line)
to
if line in fh1:
if line in fh2:
f.write(line)
You can use python's library pandas
for this:
Create dataframes for each .txt
file like below:
In [2017]: df_A = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/A.txt', header=None)
In [2018]: df_A
Out[2018]:
0
0 AAB
1 BBC
2 DDE
3 GGC
In [2019]: df_B = pd.read_fwf('/home/mayankp/Documents/Personal/stackoverflow/B.txt', header=None)
In [2020]: df_B
Out[2020]:
0
0 123
1 AAB
2 DDE
3 345
4 GHY
5 GJK
Now, merge
both dataframes(like inner join) to find out only common rows between the both.
In [2021]: df_C = pd.merge(df_A, df_B, on=0, how='inner')
Out[2021]: df_C
0
0 AAB
1 DDE
Then, you can write this output in a file like below:
In [2023]: df_C.to_csv('out.csv', index=False)
This will be efficient as no loops are required, also, no complex regex are required to be written. Code becomes cleaner and simpler.
Let me know if this helps.
Try using dictionary:
import sys
def find_common_lines(arg1, arg2, arg3):
alllines_dict = {}
with open(arg1, 'r') as f:
while True:
line = f.readline()
if not line:
break
alllines_dict[line.strip()] = 1
with open(arg3, 'w') as out:
with open(arg2, 'r') as f:
while True:
line2 = f.readline()
if not line2:
break
line2 = line2.strip()
ispresent = alllines_dict.get(line2, None)
if ispresent is not None:
out.write(line2 + '\n')
number_of_arguments = len(sys.argv)-1
print(sys.argv)
if number_of_arguments < 3:
print("ERROR:\tThe script is called with less than 3 arguments, but it needs 3!")
print("Usage:\tfind_common_lines.py <file1> <file2> <output_filepath>")
else:
arg1 = sys.argv[1]
arg2 = sys.argv[2]
arg3 = sys.argv[3]
find_common_lines(arg1, arg2, arg3)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.