简体   繁体   中英

Using pandas to join specific elements from three separate CSV files into one CSV file

I have the following three CSV files:

1.csv:
id,status,env
aaaa,PASS,PROD
aaaa,PASS,DEV
bbbb,PASS,PROD
bbbb,PASS,DEV

2.csv:
id,successPct24,env
aaaa,"99.73",PROD
aaaa,"99.89",DEV
bbbb,"100.00",PROD
bbbb,"92.53",DEV

3.csv
id,successPctMonth,env
aaaa,"99.70",PROD
aaaa,"99.90",DEV
bbbb,"100.00",PROD
bbbb,"99.91",DEV

The goal is to create a single CSV file formatted as follows:

id,status,successPct24,successPctMonth,env

So, using my example CSV files, the single CSV should look like this:

aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,92.53,99.91,DEV

I've attempted to use the following Python code to accomplish this...

import pandas as pd

csv1 = pd.read_csv("1.csv", index_col=[0], usecols=["id", "status"])

csv2 = pd.read_csv("2.csv", index_col=[0], usecols=["id", "successPct24"])

csv3 = pd.read_csv("3.csv", index_col=[0], usecols=["id", "successPctMonth", "env"])

firstcsv = csv1.join(csv2)

finalcsv = firstcsv.join(csv3)

# print (finalcsv)

finalcsv.to_csv('4.csv', index=True)

...but the resulting single CSV is not correct:

aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.73,99.9,DEV
aaaa,PASS,99.89,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.73,99.9,DEV
aaaa,PASS,99.89,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,100.0,99.91,DEV
bbbb,PASS,92.53,100.0,PROD
bbbb,PASS,92.53,99.91,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,100.0,99.91,DEV
bbbb,PASS,92.53,100.0,PROD
bbbb,PASS,92.53,99.91,DEV

I'm sure there's a parameter I'm missing, or something I've misconfigured. Any assistance with this request would be greatly appreciated.

join function always uses indexes to perform the join, and you have multiple records sharing the same index, if you need to join using more columns then use merge .

Frist solution is just to assign the columns, but this is only if they are in the same order:

temp = csv1.copy() 
temp['successPct24'] = csv2['successPct24']
temp['successPctMonth'] = csv3['successPctMonth']
temp['env'] = csv3['env']

print(temp)

The second solution is to use merge but the index itself is not sufficient so you need to use the env column:

csv1 = pd.read_csv("1.csv", usecols=["id", "status", "env"])
csv2 = pd.read_csv("2.csv", usecols=["id", "successPct24", "env"])
csv3 = pd.read_csv("3.csv", usecols=["id", "successPctMonth", "env"])

firstcsv = csv1.merge(csv2, left_on=["id", "env"], right_on=["id", "env"])
finalcsv = firstcsv.merge(csv3,  left_on=["id", "env"], right_on=["id", "env"])

finalcsv.set_index('id', inplace=True)

you need to join by 2 columns - 'id' and 'env' Code:

df1 = pd.read_csv("1.csv")
df2 = pd.read_csv("2.csv")
df3 = pd.read_csv("3.csv")
finalcsv = df1.merge(df2, 'left', on=['id', 'env']).merge(df3, 'left', on=['id', 'env'])

Result:

    id      status  env     successPct24    successPctMonth
0   aaaa    PASS    PROD    99.73           99.70
1   aaaa    PASS    DEV     99.89           99.90
2   bbbb    PASS    PROD    100.00          100.00
3   bbbb    PASS    DEV     92.53           99.91

If you need another order of columns:

finalcsv = finalcsv[['id', 'status', 'successPct24', 'successPctMonth', 'env']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM