简体   繁体   中英

How to merge two csv files?

I have two csv files like this

"id","h1","h2","h3", ...
"1","blah","blahla"
"4","bleh","bleah"

I'd like to merge the two files so that if there's the same id in both files, the values of the row should come from the second file. If they have different ids, then the merged file should contain both rows.


Some values have comas

"54","34,2,3","blah"
res = {}

a=open('a.csv')
for line in a:
    (id, rest) = line.split(',', 1)
    res[id] = rest
a.close()

b=open('b.csv')
for line in b:
    (id, rest) = line.split(',', 1)
    res[id] = rest
b.close()

c=open('c.csv', 'w')
for id, rest in res.items():
    f.write(id+","+rest)
f.close()

Basically you're using the first column of each line as key in the dictionary res . Because b.csv is the second file, keys that already existed in the first file (a.csv) will be overwritten. Finally you merge key and rest together again in the output file c.csv.

Also the header row will be taken from the second file, but these should not differ anyway I guess.

Edit: A slightly different solution that merges an arbitrary number of files and outputs rows in order:

res = {}
files_to_merge = ['a.csv', 'b.csv']
for filename in files_to_merge:
    f=open(filename)
    for line in f:
        (id, rest) = line.split(',', 1)
        if rest[-1] != '\n': #last line may be missing a newline
            rest = rest + '\n'
        res[id] = rest
    f.close()

f=open('c.csv', 'w')
f.write("\"id\","+res["\"id\""])
del res["\"id\""]
for id, rest in sorted(res.iteritems()):
    f.write(id+","+rest)
f.close()

Keeping key order, and maintaining the last row based on id , you can do something like:

import csv
from collections import OrderedDict
from itertools import chain

incsv = [csv.DictReader(open(fname)) for fname in ('/home/jon/tmp/test1.txt', '/home/jon/tmp/test2.txt')]
rows = OrderedDict((row['id'], row) for row in chain.from_iterable(incsv))
for row in rows.itervalues(): # write out to new file or whatever here instead
    print row

Python3

import csv

with open("a.csv") as a:
    fields = next(a)
    D = {k: v for k,*v in csv.reader(a)}

with open("b.csv") as b:
    next(b)
    D.update({k: v for k,*v in csv.reader(b)})

with open("c.csv", "w") as c:
    c.write(fields)
    csv.writer(c, quoting=csv.QUOTE_ALL).writerows([k]+v for k,v in D.items())

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM