I have 2 csv files sorted on ID.
File1.csv
ID X Y Z
1 10 20 30
3 23 12 15
5 40 50 60
File2.csv
ID X Y Z
1 5 10 15
2 40 50 60
5 55 12 22
I want to iterate through both the files and look at the ID ( row[0]) and do two operations:
If both IDs exist (here "1" and "5"), then add this record in new file named diff.csv
ID x1 x2 diffx y1 y2 diffy z1 z2 diffz
1 10 20 -10 20 10 10 30 15 15
5 40 55 -15 50 12 38 60 22 38
If ID exist in only 1st file, then add this id in onlyf1.csv
ID
3
If ID exist in only 2nd file, then add this id in onlyf2.csv
ID
2
I can think of just reading the files in pandas.
f1 = pd.read_csv("File1.csv")
f2 = pd.read_csv("File2.csv")
Can anyone help me on how to filter out the data and do operations on this?
You can merge
it first,then groupby
columns, after get the diff
, we can concat
it back to the merged df
s=df1.merge(df2,on = 'ID', how = 'inner')
t=s.groupby(np.array(s.columns.str.split('_').str[0]),axis=1).diff().dropna(axis=1).add_suffix('DIFF')
pd.concat([s,t],axis=1).sort_index(axis=1)
Out[896]:
ID X_x X_y X_yDIFF Y_x Y_y Y_yDIFF Z_x Z_y Z_yDIFF
0 1 10 5 -5.0 20 10 -10.0 30 15 -15.0
1 5 40 55 15.0 50 12 -38.0 60 22 -38.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.