简体   繁体   中英

Python - How to join multiple csv files sharing similar data, but in additional columns?

I need to combine multiple .csv files into one, they share most of the first column features, but the second column has variables, which change from file to file. My struggle: I want to make it so first column gets updated (appended?) every time I write on it, and the 2nd column has its features assigned to each of the 1st column, and also added on a new column, additively.

Example:

Dataset1.csv 

plane1,100
plane2,100
plane3,400
plane5,600
plane4,700

Dataset2.csv

plane1,150
plane3,100
plane4,300

Dataset3.csv

plane3,300
plane4,250
plane6,180

I want them to end up as:

output.csv

plane1,100,150,-
plane2,100,-,-,-
plane3,400,100,300
plane4,700,300,250
plane5,600,-,-
plane6,-,-,180

Any help is appreciated.

This solves the exact issue you seem to be having:

import pandas as pd

df1 = pd.read_csv('Dataset1.csv', header=None, index_col=0)
df2 = pd.read_csv('Dataset2.csv', header=None, index_col=0)
df3 = pd.read_csv('Dataset3.csv', header=None, index_col=0)

df = pd.concat([df1,df2,df3], axis=1)

df.to_csv('output.csv')

Say you have 3 dataframes.

df1:

df1 = pd.DataFrame({'plane':['plane1','plane2','plane3','plane4','plane5'],
                  'value':[100,100,400,600,700]})

Output:

    plane   value
0   plane1  100
1   plane2  100
2   plane3  400
3   plane4  600
4   plane5  700

df2:

df2 = pd.DataFrame({'plane':['plane1','plane3','plane4'],
                  'value':[150,100,300]})

Output:

    plane   value
0   plane1  150
1   plane3  100
2   plane4  300

df3:

df3 = pd.DataFrame({'plane':['plane3','plane4','plane6'],
                  'value':[300,250,180]})

Output:

    plane   value
0   plane3  300
1   plane4  250
2   plane6  180

Run:

mid_res = pd.merge(df1,df2,how='outer',on='plane') 
result = pd.merge(mid_res,df3,how='outer',on='plane')

Output:

    plane   value_x value_y value
0   plane1  100.0   150.0   NaN
1   plane2  100.0   NaN     NaN
2   plane3  400.0   100.0   300.0
3   plane4  600.0   300.0   250.0
4   plane5  700.0   NaN     NaN
5   plane6  NaN     NaN     180.0

If you want NaN shows as "-". run:

result = result.fillna('-')

Get:

    plane   value_x value_y value
0   plane1  100     150     -
1   plane2  100     -       -
2   plane3  400     100     300
3   plane4  600     300     250
4   plane5  700     -       -
5   plane6  -       -       180

Now you can export CSV file:

result.to_csv('result.csv')

The concat solution works when the values in column 'plane' are unique.

Please vote if this answers your question.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM