简体   繁体   English

基于pandas数据帧中的关键列减去列

[英]Subtracting columns based on key column in pandas dataframe

I have two dataframes looking like我有两个数据框看起来像

df1: df1:

   ID    A   B   C   D 
0 'ID1' 0.5 2.1 3.5 6.6
1 'ID2' 1.2 5.5 4.3 2.2
2 'ID1' 0.7 1.2 5.6 6.0 
3 'ID3' 1.1 7.2 10. 3.2

df2: df2:

   ID    A   B   C   D 
0 'ID1' 1.0 2.0 3.3 4.4
1 'ID2' 1.5 5.0 4.0 2.2
2 'ID3' 0.6 1.2 5.9 6.2 
3 'ID4' 1.1 7.2 8.5 3.0

df1 can have multiple entries with the same ID whereas each ID occurs only once in df2. df1 可以有多个具有相同ID条目,而每个ID在 df2 中只出现一次。 Also not all ID in df2 are necessarily present in df1.也并非 df2 中的所有ID都必须出现在 df1 中。 I can't solve this by using set_index() as multiple rows in df1 can have the same ID , and that the ID in df1 and df2 are not aligned.我无法通过使用set_index()解决此问题,因为 df1 中的多行可以具有相同的ID ,并且 df1 和 df2 中的ID未对齐。

I want to create a new dataframe where I subtract the values in df2[['A','B','C','D']] from df1[['A','B','C','D']] based on matching the ID.我想创建一个新的数据框,我从df1[['A','B','C','D']]减去df2[['A','B','C','D']] df1[['A','B','C','D']]基于匹配的 ID。

The resulting dataframe would look like:生成的数据框如下所示:

df_new: df_new:

   ID     A    B   C   D 
0 'ID1' -0.5  0.1 0.2 2.2
1 'ID2' -0.3  0.5 0.3 0.0
2 'ID1' -0.3 -0.8 2.3 1.6
3 'ID3'  0.5  6.0 1.5 0.2

I know how to do this with a loop, but since I'm dealing with huge data quantities this is not practical at all.我知道如何用循环来做到这一点,但由于我正在处理大量数据,这根本不切实际。 What is the best way of approaching this with Pandas?用 Pandas 解决这个问题的最佳方法是什么?

You just need set_index and subtract你只需要 set_index 和减去

(df1.set_index('ID')-df2.set_index('ID')).dropna(axis=0)
Out[174]: 
         A    B    C    D
ID                       
'ID1' -0.5  0.1  0.2  2.2
'ID1' -0.3 -0.8  2.3  1.6
'ID2' -0.3  0.5  0.3  0.0
'ID3'  0.5  6.0  4.1 -3.0

If the order matters add reindex for df2如果订单很重要,请为 df2 添加reindex

(df1.set_index('ID')-df2.set_index('ID').reindex(df1.ID)).dropna(axis=0).reset_index()
Out[211]: 
      ID    A    B    C    D
0  'ID1' -0.5  0.1  0.2  2.2
1  'ID2' -0.3  0.5  0.3  0.0
2  'ID1' -0.3 -0.8  2.3  1.6
3  'ID3'  0.5  6.0  4.1 -3.0

Similarly to what Wen (who beat me to it) proposed, you can use pd.DataFrame.subtract :类似于 Wen (谁打败了我)提出的,您可以使用pd.DataFrame.subtract

df1.set_index('ID').subtract(df2.set_index('ID')).reset_index()

         A    B    C    D
ID                       
'ID1' -0.5  0.1  0.2  2.2
'ID1' -0.3 -0.8  2.3  1.6
'ID2' -0.3  0.5  0.3  0.0
'ID3'  0.5  6.0  4.1 -3.0

One method is to use numpy .一种方法是使用numpy We can extract the ordered indices required from df2 using numpy.searchsorted .我们可以使用numpy.searchsorteddf2提取所需的有序索引。

Then feed this into the construction of a new dataframe.然后将其输入到新数据帧的构建中。

idx = np.searchsorted(df2['ID'], df1['ID'])

res = pd.DataFrame(df1.iloc[:, 1:].values - df2.iloc[:, 1:].values[idx],
                   index=df1['ID']).reset_index()

print(res)

      ID    0    1    2    3
0  'ID1' -0.5  0.1  0.2  2.2
1  'ID2' -0.3  0.5  0.3  0.0
2  'ID1' -0.3 -0.8  2.3  1.6
3  'ID3'  0.5  6.0  4.1 -3.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM