I have the following three CSV files:
1.csv:
id,status,env
aaaa,PASS,PROD
aaaa,PASS,DEV
bbbb,PASS,PROD
bbbb,PASS,DEV
2.csv:
id,successPct24,env
aaaa,"99.73",PROD
aaaa,"99.89",DEV
bbbb,"100.00",PROD
bbbb,"92.53",DEV
3.csv
id,successPctMonth,env
aaaa,"99.70",PROD
aaaa,"99.90",DEV
bbbb,"100.00",PROD
bbbb,"99.91",DEV
The goal is to create a single CSV file formatted as follows:
id,status,successPct24,successPctMonth,env
So, using my example CSV files, the single CSV should look like this:
aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,92.53,99.91,DEV
I've attempted to use the following Python code to accomplish this...
import pandas as pd
csv1 = pd.read_csv("1.csv", index_col=[0], usecols=["id", "status"])
csv2 = pd.read_csv("2.csv", index_col=[0], usecols=["id", "successPct24"])
csv3 = pd.read_csv("3.csv", index_col=[0], usecols=["id", "successPctMonth", "env"])
firstcsv = csv1.join(csv2)
finalcsv = firstcsv.join(csv3)
# print (finalcsv)
finalcsv.to_csv('4.csv', index=True)
...but the resulting single CSV is not correct:
aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.73,99.9,DEV
aaaa,PASS,99.89,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
aaaa,PASS,99.73,99.7,PROD
aaaa,PASS,99.73,99.9,DEV
aaaa,PASS,99.89,99.7,PROD
aaaa,PASS,99.89,99.9,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,100.0,99.91,DEV
bbbb,PASS,92.53,100.0,PROD
bbbb,PASS,92.53,99.91,DEV
bbbb,PASS,100.0,100.0,PROD
bbbb,PASS,100.0,99.91,DEV
bbbb,PASS,92.53,100.0,PROD
bbbb,PASS,92.53,99.91,DEV
I'm sure there's a parameter I'm missing, or something I've misconfigured. Any assistance with this request would be greatly appreciated.
join
function always uses indexes to perform the join, and you have multiple records sharing the same index, if you need to join using more columns then use merge
.
Frist solution is just to assign the columns, but this is only if they are in the same order:
temp = csv1.copy()
temp['successPct24'] = csv2['successPct24']
temp['successPctMonth'] = csv3['successPctMonth']
temp['env'] = csv3['env']
print(temp)
The second solution is to use merge
but the index itself is not sufficient so you need to use the env
column:
csv1 = pd.read_csv("1.csv", usecols=["id", "status", "env"])
csv2 = pd.read_csv("2.csv", usecols=["id", "successPct24", "env"])
csv3 = pd.read_csv("3.csv", usecols=["id", "successPctMonth", "env"])
firstcsv = csv1.merge(csv2, left_on=["id", "env"], right_on=["id", "env"])
finalcsv = firstcsv.merge(csv3, left_on=["id", "env"], right_on=["id", "env"])
finalcsv.set_index('id', inplace=True)
you need to join by 2 columns - 'id' and 'env'
Code:
df1 = pd.read_csv("1.csv")
df2 = pd.read_csv("2.csv")
df3 = pd.read_csv("3.csv")
finalcsv = df1.merge(df2, 'left', on=['id', 'env']).merge(df3, 'left', on=['id', 'env'])
Result:
id status env successPct24 successPctMonth
0 aaaa PASS PROD 99.73 99.70
1 aaaa PASS DEV 99.89 99.90
2 bbbb PASS PROD 100.00 100.00
3 bbbb PASS DEV 92.53 99.91
If you need another order of columns:
finalcsv = finalcsv[['id', 'status', 'successPct24', 'successPctMonth', 'env']]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.