[英]Assign values to columns in Pandas Dataframe based on data from another dataframe
I have two dataframes containing concentration data and coordinates: 我有两个包含浓度数据和坐标的数据框:
Concentration Data (conc): 浓度数据(浓度):
Sample analParam Conc Units
0 CW7-1 1,1,1-Trichloroethane 0 UG/L
1 CW7-1 1,1,2,2-Tetrachloroethane 0 UG/L
2 CW7-1 1,1,2-Trichloroethane 0 UG/L
3 CW7-1 1,1-Dichloroethane 0 UG/L
4 CW7-1 1,1-Dichloroethylene 0 UG/L
5 CW7-1 1,1-Dichloropropene 0 UG/L
6 CW7-1 1,2,3-Trichlorobenzene 0 UG/L
... ... ... ... ...
50311 VOA2-2 Tetrachloroethylene 1.8 MG/KG
50312 VOA2-2 Toluene 1.2 MG/KG
50313 VOA2-2 Trichloroethylene 1.8 MG/KG
50314 VOA2-2 Vinyl Chloride 1.8 MG/KG
Coordinate Data (coord): 坐标数据(坐标):
Sample x y
0 CW7-1 320800.000 396500.000
1 CW7-2 320800.000 396500.000
2 CW7-3 320800.000 396500.000
3 FB06-17 0.000 0.000
4 FB06-18 0.000 0.000
5 FB06-19 0.000 0.000
6 FB07-08 0.000 0.000
... ... ... ...
453 TP21-1 318807.281 398547.485
454 TP21-2 318807.281 398547.485
455 TP24-1 318489.248 398544.797
456 VOA1-1 318500.582 398573.558
457 VOA1-2 318500.582 398573.558
458 VOA2-1 318536.337 398589.805
459 VOA2-2 318536.337 398589.805
I want to add two columns to my concentration dataframe that contains all the coordinates of the respective sample IDs for each concentration. 我想在浓度数据框中添加两列,其中包含每种浓度的相应样品ID的所有坐标。 For example, the first six rows in the Concentration Data would have columns of x = 320800 and y = 396500 since they all have Sample IDs of CW7-1:
例如,浓度数据的前六行将具有x = 320800和y = 396500的列,因为它们的样本ID均为CW7-1:
Sample analParam Conc Units x y
0 CW7-1 1,1,1-Trichloroethane 0 UG/L 320800 396500
1 CW7-1 1,1,2,2-Tetrachloroethane 0 UG/L 320800 396500
2 CW7-1 1,1,2-Trichloroethane 0 UG/L 320800 396500
3 CW7-1 1,1-Dichloroethane 0 UG/L 320800 396500
4 CW7-1 1,1-Dichloroethylene 0 UG/L 320800 396500
5 CW7-1 1,1-Dichloropropene 0 UG/L 320800 396500
I've tried using double for loops, but it takes way too slow since I have so many data points: 我已经尝试过使用double for循环,但是由于我有很多数据点,它的速度太慢了:
for index, row in conc.iterrows():
for cindex, crow in coord.iterrows():
if conc.iloc[index,0] == coord.iloc[cindex,0]:
conc.at[index,4] = coord.iloc[cindex,1]
conc.at[index,5] = coord.iloc[cindex,2]
I've tried using the apply function, but I keep getting errors. 我已经尝试过使用apply函数,但是却不断出错。 For this rendition, I got TypeError: call () takes from 1 to 2 positional arguments but 3 were given.
对于此演示,我得到了TypeError: call ()从1到2个位置参数,但是给出了3个。
def xcoord (i):
for index, row in coord.iterrows():
if i == coord.iloc[index,0] :
return coord.iloc(index,4)
conc['Sample'].apply(xcoord)
Thanks Wen! 谢谢温!
In[1]:
conc.merge(coord,on='Sample',how='left')
Out[1]:
Sample analParam Conc Units x y
0 CW7-1 1,1,1-Trichloroethane 0 UG/L 320800.000 396500.000
1 CW7-1 1,1,2,2-Tetrachloroethane 0 UG/L 320800.000 396500.000
2 CW7-1 1,1,2-Trichloroethane 0 UG/L 320800.000 396500.000
3 CW7-1 1,1-Dichloroethane 0 UG/L 320800.000 396500.000
4 CW7-1 1,1-Dichloroethylene 0 UG/L 320800.000 396500.000
5 CW7-1 1,1-Dichloropropene 0 UG/L 320800.000 396500.000
6 CW7-1 1,2,3-Trichlorobenzene 0 UG/L 320800.000 396500.000
... ... ... ... ... ... ...
50311 VOA2-2 Tetrachloroethylene 1.8 MG/KG 318536.337 398589.805
50312 VOA2-2 Toluene 1.2 MG/KG 318536.337 398589.805
50313 VOA2-2 trans-1,3-Dichloropropene 1.8 MG/KG 318536.337 398589.805
50314 VOA2-2 Trichloroethylene 1.8 MG/KG 318536.337 398589.805
50315 VOA2-2 Vinyl Chloride 1.8 MG/KG 318536.337 398589.805
50316 VOA2-2 Xylenes (Total) 2.6 MG/KG 318536.337 398589.805
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.