Using pandas to join specific elements from three separate CSV files into one CSV file

Question

I have the following three CSV files:

1.csv:
id,status,env
aaaa,PASS,PROD
aaaa,PASS,DEV
bbbb,PASS,PROD
bbbb,PASS,DEV

2.csv:
id,successPct24,env
aaaa,"99.73",PROD
aaaa,"99.89",DEV
bbbb,"100.00",PROD
bbbb,"92.53",DEV

3.csv
id,successPctMonth,env
aaaa,"99.70",PROD
aaaa,"99.90",DEV
bbbb,"100.00",PROD
bbbb,"99.91",DEV

The goal is to create a single CSV file formatted as follows:

id,status,successPct24,successPctMonth,env

So, using my example CSV files, the single CSV should look like this:

aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,92.53,99.91,DEV

I've attempted to use the following Python code to accomplish this...

import pandas as pd

csv1 = pd.read_csv("1.csv", index_col=[0], usecols=["id", "status"])

csv2 = pd.read_csv("2.csv", index_col=[0], usecols=["id", "successPct24"])

csv3 = pd.read_csv("3.csv", index_col=[0], usecols=["id", "successPctMonth", "env"])

firstcsv = csv1.join(csv2)

finalcsv = firstcsv.join(csv3)

# print (finalcsv)

finalcsv.to_csv('4.csv', index=True)

...but the resulting single CSV is not correct:

aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.73,99.9,DEV
aaaa,PASS,99.89,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.73,99.9,DEV
aaaa,PASS,99.89,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,100.0,99.91,DEV
bbbb,PASS,92.53,100.0,PROD
bbbb,PASS,92.53,99.91,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,100.0,99.91,DEV
bbbb,PASS,92.53,100.0,PROD
bbbb,PASS,92.53,99.91,DEV

I'm sure there's a parameter I'm missing, or something I've misconfigured. Any assistance with this request would be greatly appreciated.

Answer 1

join function always uses indexes to perform the join, and you have multiple records sharing the same index, if you need to join using more columns then use merge .

Frist solution is just to assign the columns, but this is only if they are in the same order:

temp = csv1.copy() 
temp['successPct24'] = csv2['successPct24']
temp['successPctMonth'] = csv3['successPctMonth']
temp['env'] = csv3['env']

print(temp)

The second solution is to use merge but the index itself is not sufficient so you need to use the env column:

csv1 = pd.read_csv("1.csv", usecols=["id", "status", "env"])
csv2 = pd.read_csv("2.csv", usecols=["id", "successPct24", "env"])
csv3 = pd.read_csv("3.csv", usecols=["id", "successPctMonth", "env"])

firstcsv = csv1.merge(csv2, left_on=["id", "env"], right_on=["id", "env"])
finalcsv = firstcsv.merge(csv3,  left_on=["id", "env"], right_on=["id", "env"])

finalcsv.set_index('id', inplace=True)

Answer 2

you need to join by 2 columns - 'id' and 'env' Code:

df1 = pd.read_csv("1.csv")
df2 = pd.read_csv("2.csv")
df3 = pd.read_csv("3.csv")
finalcsv = df1.merge(df2, 'left', on=['id', 'env']).merge(df3, 'left', on=['id', 'env'])

Result:

    id      status  env     successPct24    successPctMonth
0   aaaa    PASS    PROD    99.73           99.70
1   aaaa    PASS    DEV     99.89           99.90
2   bbbb    PASS    PROD    100.00          100.00
3   bbbb    PASS    DEV     92.53           99.91

If you need another order of columns:

finalcsv = finalcsv[['id', 'status', 'successPct24', 'successPctMonth', 'env']]

Using pandas to join specific elements from three separate CSV files into one CSV file

Question

2 answers

solution1
2 2020-04-03 17:09:05

solution2
2 ACCPTED 2020-04-03 17:13:00

Using pandas to join specific elements from three separate CSV files into one CSV file

Question

2 answers

solution1 2 2020-04-03 17:09:05

solution2 2 ACCPTED 2020-04-03 17:13:00

solution1
2 2020-04-03 17:09:05

solution2
2 ACCPTED 2020-04-03 17:13:00