I have two csv files like this
"id","h1","h2","h3", ...
"1","blah","blahla"
"4","bleh","bleah"
I'd like to merge the two files so that if there's the same id in both files, the values of the row should come from the second file. If they have different ids, then the merged file should contain both rows.
Some values have comas
"54","34,2,3","blah"
res = {}
a=open('a.csv')
for line in a:
(id, rest) = line.split(',', 1)
res[id] = rest
a.close()
b=open('b.csv')
for line in b:
(id, rest) = line.split(',', 1)
res[id] = rest
b.close()
c=open('c.csv', 'w')
for id, rest in res.items():
f.write(id+","+rest)
f.close()
Basically you're using the first column of each line as key in the dictionary res
. Because b.csv is the second file, keys that already existed in the first file (a.csv) will be overwritten. Finally you merge key
and rest
together again in the output file c.csv.
Also the header row will be taken from the second file, but these should not differ anyway I guess.
Edit: A slightly different solution that merges an arbitrary number of files and outputs rows in order:
res = {}
files_to_merge = ['a.csv', 'b.csv']
for filename in files_to_merge:
f=open(filename)
for line in f:
(id, rest) = line.split(',', 1)
if rest[-1] != '\n': #last line may be missing a newline
rest = rest + '\n'
res[id] = rest
f.close()
f=open('c.csv', 'w')
f.write("\"id\","+res["\"id\""])
del res["\"id\""]
for id, rest in sorted(res.iteritems()):
f.write(id+","+rest)
f.close()
Keeping key order, and maintaining the last row based on id
, you can do something like:
import csv
from collections import OrderedDict
from itertools import chain
incsv = [csv.DictReader(open(fname)) for fname in ('/home/jon/tmp/test1.txt', '/home/jon/tmp/test2.txt')]
rows = OrderedDict((row['id'], row) for row in chain.from_iterable(incsv))
for row in rows.itervalues(): # write out to new file or whatever here instead
print row
import csv
with open("a.csv") as a:
fields = next(a)
D = {k: v for k,*v in csv.reader(a)}
with open("b.csv") as b:
next(b)
D.update({k: v for k,*v in csv.reader(b)})
with open("c.csv", "w") as c:
c.write(fields)
csv.writer(c, quoting=csv.QUOTE_ALL).writerows([k]+v for k,v in D.items())
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.