[英]Pandas Drop partial duplicates
I have 2 dfs: 我有2个DFS:
df1: df1:
x y z
0 1 2 r
1 a c 2
2 22 g d
df2: df2:
x y z
0 1 2 r
1 a b 2
2 3 g d
I want to drop when column y
and z
are duplicated. 当列y
和z
重复时,我想删除。
Desired result: 所需结果:
x y z
1 a c 2
Because df1 and df2 both have same values in column y
and z
因为df1和df2在y
和z
列中都具有相同的值
cols=['y','z']#columns to check for having same value
df1[~(df1[cols]==df2[cols]).all(axis=1)]#extracting the rows where x and y are `not equal(~)` in both dataframes
Using pd.merge
you can do 使用pd.merge
你可以做
In [266]: dff = df1.merge(df2, on=['y', 'z'], how='left', indicator=True,
suffixes=['', 'right'])
In [267]: dff.loc[dff['_merge'].eq('left_only'), ['x', 'y', 'z']]
Out[267]:
x y z
1 a c 2
One solution could be 一种解决方案可能是
df1[df1.merge(df2, 'left', ['y', 'z']).x_y.isnull()]
Or, somewhat more low-key, 或者,有些低调,
df1[(df1[['y', 'z']] != df2[['y', 'z']]).any(1)]
Another way to achieve it is using loc
实现它的另一种方法是使用loc
pd.DataFrame(df1.loc[(df1.y != df2.y) | (df1.z != df2.z)])
Output 输出量
x y z
1 a c 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.