简体   繁体   中英

How to merge two csv files line by line

I'm using python 3.0 in windows.

i have 2 csv files

file1.csv

0, 10,12
0,11,12
1,15,12
2, 17,12

file2.csv

0, 2, 1
1,22, 1
3, 11, 1

output.csv

0, 10,12
0,11,12
0, 2, 1
1,15,12
1,22, 1
2, 17,12
3, 11, 1

i tried the following code

a = pd.read_csv('file1.csv')
b = pd.read_csv('file2.csv')
c = pd.concat([a, b], join='outer')
c.to_csv("Output.csv", index = False)

But my output is

0, 10,12
0,11,12
1,15,12
2, 17,12
0, 2, 1
1,22, 1
3, 11, 1

Can you give me some pointers please. I'm new to python.

You can concatenate and sort them with pandas:

df = pd.concat([a, b]).astype('str')

df = df.sort_values(list(df.columns))

df.to_csv('Output.csv', index=False)

This does not create the output file but it demonstrates how heapq.merge could help:

from heapq import merge
inputs = [file(f) for f in ['file1.csv', 'file2.csv']]
for line in merge(*inputs):
    print line,

With the sample data this produces

0, 10,12
0, 2, 1
0,11,12
1,15,12
1,22, 1
2, 17,12
3, 11, 1

However this differs from the sample output in the ordering of the initial lines:

0, 10,12
0,11,12
0, 2, 1

but I'm not sure how to produce this ordering. The sample output lines do not appear to be ordered by character or numeric columns (a numeric ordering of the fields would presumably put 0, 2, 1 first).

EDIT: it appears the lines are ordered as if spaces were not present. The following example:

from heapq import merge
def reader(f):
    for line in file(f):
        yield line.replace(' ',''), line
inputs = [reader(f) for f in ['file1.csv', 'file2.csv']]
for pair in merge(*inputs):
    print pair[1],

generates this ordering:

0, 10,12
0,11,12
0, 2, 1
1,15,12
1,22, 1
2, 17,12
3, 11, 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM