[英]How to remove rows from Pandas dataframe if the same row exists in another dataframe but end up with all columns from both df
[英]Check if row in dataframe exists in another dataframe and remove from both
我正在尝试检查一个 dataframe 中的行是否存在于另一个中,如果是,我想从两个数据帧中删除它们。 到目前为止,我看到的所有示例都使用 pd.merge,但它合并为一个 dataframe。 我的目标是保留两个单独的数据框并删除公共行。
示例如下:
df1:
id name class Grade
0 2547 John Math 119.01
1 2547 Joe Science 0.00
2 2547 Steve History 0.47
3 2547 Hari PE 5.70
df2:
id name class Grade
0 2547 John Math 119.01
1 2547 Joe Science 2
2 2547 Steve History 22
3 2547 Hari PE 5.71
expected output:
df1:
id name class Grade
0 2547 Joe Science 0.00
1 2547 Steve History 0.47
2 2547 Hari PE 5.70
df2:
id name class Grade
0 2547 Joe Science 2
1 2547 Steve History 22
2 2547 Hari PE 5.71
到目前为止,我尝试如下,但这没有帮助,因为它合并了两个数据框:
df = pd.merge(df1, df2, on=['Grade'], how='outer')
您可以使用inner
合并 append 将公共行存储到两个数据帧,并删除重复项而不保留任何重复项drop_duplicates(keep=False)
:
t = df1.merge(df2,'inner')
df2, df1 = df2.append(t).drop_duplicates(keep=False) , df1.append(t).drop_duplicates(keep=False)
印刷:
>>> df1
id name class Grade
1 2547 Joe Science 0.00
2 2547 Steve History 0.47
3 2547 Hari PE 5.70
>>> df2
id name class Grade
1 2547 Joe Science 2.00
2 2547 Steve History 22.00
3 2547 Hari PE 5.71
>>> t
id name class Grade
0 2547 John Math 119.01
使用from_frame
方法创建MultiIndex
对象,然后使用MultiIndex.isin
检查成员资格以创建 boolean 掩码以过滤行
i1 = pd.MultiIndex.from_frame(df1)
i2 = pd.MultiIndex.from_frame(df2)
df1, df2 = df1[~i1.isin(i2)], df2[~i2.isin(i1)]
>>> df1
id name class Grade
1 2547 Joe Science 0.00
2 2547 Steve History 0.47
3 2547 Hari PE 5.70
>>> df2
id name class Grade
1 2547 Joe Science 2.00
2 2547 Steve History 22.00
3 2547 Hari PE 5.71
Try this -
dataframe1 = pd.DataFrame(data={"column1": [1, 2, 3, 4, 5]})
dataframe2 = pd.DataFrame(data={"column1": [1, 2]})
common = dataframe1.merge(dataframe2, on=["column1"])
result = dataframe1[~dataframe1.column1.isin(common.column1)]
print(result)
有关更多详细信息,请参阅本文 - https://www.kite.com/python/answers/how-to-get-rows-from-a-dataframe-that-are-not-in-another-dataframe-in-python
或者你可以你这个只是为了检查-
df = df1.merge(df2, on='grade', suffixes=(' from df1',' from df2'))
df.insert(0, 'id', df['grade'] + '-' + df.pop('grade'))
print (df)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.