简体   繁体   English

根据来自另一个数据框的数据将值分配给Pandas数据框中的列

[英]Assign values to columns in Pandas Dataframe based on data from another dataframe

I have two dataframes containing concentration data and coordinates: 我有两个包含浓度数据和坐标的数据框:

Concentration Data (conc): 浓度数据(浓度):

    Sample  analParam                 Conc  Units
0   CW7-1   1,1,1-Trichloroethane     0     UG/L
1   CW7-1   1,1,2,2-Tetrachloroethane 0     UG/L
2   CW7-1   1,1,2-Trichloroethane     0     UG/L
3   CW7-1   1,1-Dichloroethane        0     UG/L
4   CW7-1   1,1-Dichloroethylene      0     UG/L
5   CW7-1   1,1-Dichloropropene       0     UG/L
6   CW7-1   1,2,3-Trichlorobenzene    0     UG/L
... ... ... ... ...
50311   VOA2-2  Tetrachloroethylene  1.8    MG/KG
50312   VOA2-2  Toluene              1.2    MG/KG
50313   VOA2-2  Trichloroethylene    1.8    MG/KG
50314   VOA2-2  Vinyl Chloride       1.8    MG/KG

Coordinate Data (coord): 坐标数据(坐标):

    Sample  x            y
0   CW7-1   320800.000  396500.000
1   CW7-2   320800.000  396500.000
2   CW7-3   320800.000  396500.000
3   FB06-17 0.000       0.000
4   FB06-18 0.000       0.000
5   FB06-19 0.000       0.000
6   FB07-08 0.000       0.000
... ... ... ...
453 TP21-1  318807.281  398547.485
454 TP21-2  318807.281  398547.485
455 TP24-1  318489.248  398544.797
456 VOA1-1  318500.582  398573.558
457 VOA1-2  318500.582  398573.558
458 VOA2-1  318536.337  398589.805
459 VOA2-2  318536.337  398589.805

I want to add two columns to my concentration dataframe that contains all the coordinates of the respective sample IDs for each concentration. 我想在浓度数据框中添加两列,其中包含每种浓度的相应样品ID的所有坐标。 For example, the first six rows in the Concentration Data would have columns of x = 320800 and y = 396500 since they all have Sample IDs of CW7-1: 例如,浓度数据的前六行将具有x = 320800和y = 396500的列,因为它们的样本ID均为CW7-1:

    Sample  analParam                 Conc  Units   x       y
0   CW7-1   1,1,1-Trichloroethane     0     UG/L    320800  396500   
1   CW7-1   1,1,2,2-Tetrachloroethane 0     UG/L    320800  396500
2   CW7-1   1,1,2-Trichloroethane     0     UG/L    320800  396500  
3   CW7-1   1,1-Dichloroethane        0     UG/L    320800  396500  
4   CW7-1   1,1-Dichloroethylene      0     UG/L    320800  396500  
5   CW7-1   1,1-Dichloropropene       0     UG/L    320800  396500

I've tried using double for loops, but it takes way too slow since I have so many data points: 我已经尝试过使用double for循环,但是由于我有很多数据点,它的速度太慢了:

for index, row in conc.iterrows():
    for cindex, crow in coord.iterrows():
        if conc.iloc[index,0] == coord.iloc[cindex,0]:
            conc.at[index,4] = coord.iloc[cindex,1]
            conc.at[index,5] = coord.iloc[cindex,2]

I've tried using the apply function, but I keep getting errors. 我已经尝试过使用apply函数,但是却不断出错。 For this rendition, I got TypeError: call () takes from 1 to 2 positional arguments but 3 were given. 对于此演示,我得到了TypeError: call ()从1到2个位置参数,但是给出了3个。

def xcoord (i):
    for index, row in coord.iterrows():
        if i == coord.iloc[index,0] :
            return coord.iloc(index,4)
conc['Sample'].apply(xcoord)

Thanks Wen! 谢谢温!

In[1]:
conc.merge(coord,on='Sample',how='left')

Out[1]:
Sample  analParam   Conc    Units   x   y
0   CW7-1   1,1,1-Trichloroethane   0   UG/L    320800.000  396500.000
1   CW7-1   1,1,2,2-Tetrachloroethane   0   UG/L    320800.000  396500.000
2   CW7-1   1,1,2-Trichloroethane   0   UG/L    320800.000  396500.000
3   CW7-1   1,1-Dichloroethane  0   UG/L    320800.000  396500.000
4   CW7-1   1,1-Dichloroethylene    0   UG/L    320800.000  396500.000
5   CW7-1   1,1-Dichloropropene 0   UG/L    320800.000  396500.000
6   CW7-1   1,2,3-Trichlorobenzene  0   UG/L    320800.000  396500.000
... ... ... ... ... ... ...
50311   VOA2-2  Tetrachloroethylene 1.8 MG/KG   318536.337  398589.805
50312   VOA2-2  Toluene 1.2 MG/KG   318536.337  398589.805
50313   VOA2-2  trans-1,3-Dichloropropene   1.8 MG/KG   318536.337  398589.805
50314   VOA2-2  Trichloroethylene   1.8 MG/KG   318536.337  398589.805
50315   VOA2-2  Vinyl Chloride  1.8 MG/KG   318536.337  398589.805
50316   VOA2-2  Xylenes (Total) 2.6 MG/KG   318536.337  398589.805

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 子集根据另一个数据帧的值在多个列上进行pandas数据帧 - Subset pandas dataframe on multiple columns based on values from another dataframe 根据pandas数据框中的条件为列分配值 - Assign values to columns based on conditions in a pandas dataframe 根据来自另一个 DataFrame 的值更新 pandas 列中的值 - Update values in pandas columns based on values from another DataFrame Pandas 根据另一个数据框中 2 列的值过滤行 - Pandas filter rows based on values from 2 columns in another dataframe 根据 Pandas 数据框中列中的值过滤数据 - Filtering the data based on values in the columns in pandas dataframe 根据另一个数据帧将列添加到 Pandas 数据帧并将值设置为零 - Add columns to Pandas dataframe based on another dataframe and set values to zero 将值添加到基于另一个 dataframe 的 pandas dataframe 列 - adding values to pandas dataframe columns based on another dataframe 从 pandas dataframe 中选择值,基于另一个 dataframe 中具有最小值/最大值的列 - Selecting values from pandas dataframe based off of columns with min/max values in another dataframe Pandas:使用基于两列的另一个数据帧中的值替换一个数据帧中的值 - Pandas: replace values in one dataframe with values from another dataframe based on two columns Pandas Dataframe:基于正/负值分配给不同的列 - Pandas Dataframe: assign to different columns based on positive/negative values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM