![](/img/trans.png)
[英]How to subtract values in one dataframe from the other based on multiple columns?
[英]How to subtract a dataframe from a dataframe based on columns?
我有以下數據框
df1 = pd.DataFrame({
'contact_id': [1,3,4,5,-1],
'subscription_id': ['AAA', 'ccc', 'ddd', 'eee', 'fff']
});
print(df1)
contact_id subscription_id
0 1 AAA
1 3 ccc
2 4 ddd
3 5 eee
4 -1 fff
2號 dataframe
df2 = pd.DataFrame({
'contact_id': [1,2,-1],
'subscription_id': ['AAA', 'bbb', 'fff'],
'extra': ['we', 'kl', 'op']
});
print(df2)
contact_id subscription_id extra
0 1 AAA we
1 2 bbb kl
2 -1 fff op
預計 Output
contact_id subscription_id extra
1 3 ccc NaN
2 4 ddd NaN
3 5 eee NaN
我的代碼
import pandas as pd
df1 = pd.DataFrame({
'contact_id': [1,3,4,5,-1],
'subscription_id': ['AAA', 'ccc', 'ddd', 'eee', 'fff']
});
print(df1)
df2 = pd.DataFrame({
'contact_id': [1,2,-1],
'subscription_id': ['AAA', 'bbb', 'fff'],
'extra': ['we', 'kl', 'op']
});
print(df2)
sub = pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
print(sub)
誰能指導我哪里做錯了?
您想要的基本上是Left join
減去Inner Join
的結果。 這看起來像是merge
not pd.concat
的典型案例。
將df.merge
與Left
連接和indicator
列一起使用為True
。 僅通過選擇left_only
來選擇df1
中存在的行:
In [1586]: df1.merge(df2, how='left', indicator=True).query('_merge == "left_only"').drop('_merge', 1)
Out[1586]:
contact_id subscription_id extra
1 3 ccc NaN
2 4 ddd NaN
3 5 eee NaN
sub = pd.concat([df1, df2, df2]).drop_duplicates(keep=False)
代碼中的問題
df2
兩次。 (盡管這並不重要,因為您隨后會刪除重復項。)pandas.DataFrame.drop_duplicates
的subset
參數,默認情況下 pandas 將使用所有列來識別重復項。 由於不需要extra
列,您可以使用 boolean 索引
df1 = df1.loc[~((df1['contact_id'].isin(df2['contact_id']))&(df1['subscription_id'].isin(df2['subscription_id'])))]
# print(df1)
contact_id subscription_id
1 3 ccc
2 4 ddd
3 5 eee
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.