简体   繁体   English

熊猫掉落部分重复

[英]Pandas Drop partial duplicates

I have 2 dfs: 我有2个DFS:

df1: df1:

    x  y  z
0   1  2  r
1   a  c  2
2  22  g  d

df2: df2:

    x  y  z
0   1  2  r
1   a  b  2
2   3  g  d

I want to drop when column y and z are duplicated. 当列yz重复时,我想删除。

Desired result: 所需结果:

        x  y  z
    1   a  c  2

Because df1 and df2 both have same values in column y and z 因为df1和df2在yz列中都具有相同的值

 cols=['y','z']#columns to check for having same value
 df1[~(df1[cols]==df2[cols]).all(axis=1)]#extracting the rows where x and y are `not equal(~)` in both dataframes

Using pd.merge you can do 使用pd.merge你可以做

In [266]: dff = df1.merge(df2, on=['y', 'z'], how='left',  indicator=True,
                          suffixes=['', 'right'])

In [267]: dff.loc[dff['_merge'].eq('left_only'), ['x', 'y', 'z']]
Out[267]:
   x  y  z
1  a  c  2

One solution could be 一种解决方案可能是

df1[df1.merge(df2, 'left', ['y', 'z']).x_y.isnull()]

Or, somewhat more low-key, 或者,有些低调,

df1[(df1[['y', 'z']] != df2[['y', 'z']]).any(1)]

Another way to achieve it is using loc 实现它的另一种方法是使用loc

pd.DataFrame(df1.loc[(df1.y != df2.y) | (df1.z != df2.z)])

Output 输出量

    x  y  z
1   a  c  2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM