[英]Subtracting columns based on key column in pandas dataframe
I have two dataframes looking like我有两个数据框看起来像
df1: df1:
ID A B C D
0 'ID1' 0.5 2.1 3.5 6.6
1 'ID2' 1.2 5.5 4.3 2.2
2 'ID1' 0.7 1.2 5.6 6.0
3 'ID3' 1.1 7.2 10. 3.2
df2: df2:
ID A B C D
0 'ID1' 1.0 2.0 3.3 4.4
1 'ID2' 1.5 5.0 4.0 2.2
2 'ID3' 0.6 1.2 5.9 6.2
3 'ID4' 1.1 7.2 8.5 3.0
df1 can have multiple entries with the same ID
whereas each ID
occurs only once in df2. df1 可以有多个具有相同
ID
条目,而每个ID
在 df2 中只出现一次。 Also not all ID
in df2 are necessarily present in df1.也并非 df2 中的所有
ID
都必须出现在 df1 中。 I can't solve this by using set_index()
as multiple rows in df1 can have the same ID
, and that the ID
in df1 and df2 are not aligned.我无法通过使用
set_index()
解决此问题,因为 df1 中的多行可以具有相同的ID
,并且 df1 和 df2 中的ID
未对齐。
I want to create a new dataframe where I subtract the values in df2[['A','B','C','D']]
from df1[['A','B','C','D']]
based on matching the ID.我想创建一个新的数据框,我从
df1[['A','B','C','D']]
减去df2[['A','B','C','D']]
df1[['A','B','C','D']]
基于匹配的 ID。
The resulting dataframe would look like:生成的数据框如下所示:
df_new: df_new:
ID A B C D
0 'ID1' -0.5 0.1 0.2 2.2
1 'ID2' -0.3 0.5 0.3 0.0
2 'ID1' -0.3 -0.8 2.3 1.6
3 'ID3' 0.5 6.0 1.5 0.2
I know how to do this with a loop, but since I'm dealing with huge data quantities this is not practical at all.我知道如何用循环来做到这一点,但由于我正在处理大量数据,这根本不切实际。 What is the best way of approaching this with Pandas?
用 Pandas 解决这个问题的最佳方法是什么?
You just need set_index and subtract你只需要 set_index 和减去
(df1.set_index('ID')-df2.set_index('ID')).dropna(axis=0)
Out[174]:
A B C D
ID
'ID1' -0.5 0.1 0.2 2.2
'ID1' -0.3 -0.8 2.3 1.6
'ID2' -0.3 0.5 0.3 0.0
'ID3' 0.5 6.0 4.1 -3.0
If the order matters add reindex
for df2如果订单很重要,请为 df2 添加
reindex
(df1.set_index('ID')-df2.set_index('ID').reindex(df1.ID)).dropna(axis=0).reset_index()
Out[211]:
ID A B C D
0 'ID1' -0.5 0.1 0.2 2.2
1 'ID2' -0.3 0.5 0.3 0.0
2 'ID1' -0.3 -0.8 2.3 1.6
3 'ID3' 0.5 6.0 4.1 -3.0
Similarly to what Wen (who beat me to it) proposed, you can use pd.DataFrame.subtract
:类似于 Wen (谁打败了我)提出的,您可以使用
pd.DataFrame.subtract
:
df1.set_index('ID').subtract(df2.set_index('ID')).reset_index()
A B C D
ID
'ID1' -0.5 0.1 0.2 2.2
'ID1' -0.3 -0.8 2.3 1.6
'ID2' -0.3 0.5 0.3 0.0
'ID3' 0.5 6.0 4.1 -3.0
One method is to use numpy
.一种方法是使用
numpy
。 We can extract the ordered indices required from df2
using numpy.searchsorted
.我们可以使用
numpy.searchsorted
从df2
提取所需的有序索引。
Then feed this into the construction of a new dataframe.然后将其输入到新数据帧的构建中。
idx = np.searchsorted(df2['ID'], df1['ID'])
res = pd.DataFrame(df1.iloc[:, 1:].values - df2.iloc[:, 1:].values[idx],
index=df1['ID']).reset_index()
print(res)
ID 0 1 2 3
0 'ID1' -0.5 0.1 0.2 2.2
1 'ID2' -0.3 0.5 0.3 0.0
2 'ID1' -0.3 -0.8 2.3 1.6
3 'ID3' 0.5 6.0 4.1 -3.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.