I have 2 files like these :
file 1 : file 2 :
col1 col2 col1 col2
john kerry john kerry
adam lord bob abram
joe hitch
I would like to compare those two files based on lastnames and firstnames to get only a file that does not contain the people in file 2, that is to say :
desired output file :
col1 col2
adam lord
joe hitch
I tried this but I don't get the right output
import csv
reader1=csv.reader(open('file1.csv', 'r'), delimiter='\t')
reader2=csv.reader(open('file2.csv', 'r'), delimiter='\t')
writer=csv.writer(open('desired_file.csv', 'w'), delimiter=',')
row1 = reader1.next()
row2 = reader2.next()
if (row1[0] == row2[0]) and (row1[1] == row2[1]):
print 'equal'
else:
writer.writerow(row1)
writer.writerow(row2)
I'd use a set difference:
with open('file1') as f1, open('file2') as f2:
data1 = set(f1)
lines_not_in_f2 = data1.difference(f2)
If the formatting of the files can be slightly different, you might need to wrap the file objects in a generator which yields tuples:
def people(my_file):
for line in myfile:
yield tuple(x.lower() for x in line.split())
with open('file1') as f1, open('file2') as f2:
data1 = set(people(f1))
people_not_in_f2 = data1.difference(people(f2))
This has the advantage that you don't need to read the entire f2 file into memory. It has the disadvantage that the output names are unordered (since they are stored in a set).
I think you do not need the csv
module if the file formats are the same. How about this solution:
exclude_names = frozenset(open('file2')) # make set for performance
with open('output', 'w') as f:
for name in open('file1'):
if name not in exclude_names:
f.write(name)
Solution with csv
reader/writer:
import csv
exclude_names = frozenset(csv.reader(open('file2.csv', 'r'), delimiter='\t'))
with open('desired_file.csv', 'w') as f:
writer = csv.writer(f, delimiter=',')
for row in csv.reader(open('file1', 'r'), delimiter='\t'):
if row not in exclude_names:
writer.writerow(row)
results=[i for i, j in zip(reader1, reader2) if i != j]
or use set(reader1) - set(reader2)
if the order is not important.
myfile = open(..., 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(results)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.