简体   繁体   English

比较2个DataFrame时出现问题,返回错误结果

[英]Problem comparing 2 DataFrames, returns wrong result

There are 2 dfs 有2个dfs

df1 and df2 df1和df2


df1 contains:

    account_id  account_name
0   37469426    Name1
1   71508517    Name2
2   85304427    Name3
3   115964688   Name4
4   119853529   Name4

df2 contains:

    account_id  account_name
0   37469426    Name1
1   71508517    Name2
2   85304427    Name3
3   115964688   Name4
4   119853529   Name4
5       1111            Test

I want to compare them, in such way, that in df3 are the values from df1 which are not in df2 我想以这样的方式比较它们,即df3中的值是df1中的值,而不是df2中的值

In this case it should return nothing. 在这种情况下,它不应返回任何内容。

Datatypes are the same, columns are the same, the number of values differs. 数据类型相同,列相同,值的数量不同。

I've tried concat and merge, but the result is wrong. 我试过concat和合并,但结果是错误的。


merged = pd.merge(df1 , df2, on=['account_id', 'account_name'], how='right')

#returns:

    account_id  account_name
0   37469426    Name1
1   71508517    Name2
2   85304427    Name3
3   115964688   Name4
4   119853529   Name5

merged = pd.merge(df1 , df2, on=['account_id', 'account_name'], how='left')

#returns:

0   37469426    Name1
1   71508517    Name2
2   85304427    Name3
3   115964688   Name4
4   119853529   Name4
5       1111            Test

#inner / outer return everything

0   37469426    Name1
1   71508517    Name2
2   85304427    Name3
3   115964688   Name4
4   119853529   Name4
5       1111            Test

compare_ga_accounts = pd.concat([df1 , df2])
compare_ga_accounts.drop_duplicates(keep=False, inplace=True)

#returns:

    account_id  account_name
0   1111            Test

I have no idea why it happens like that(( 我不知道为什么会这样

You can just use isin to compare the column values. 您可以只使用isin比较列值。 For example, 例如,

 compare_ga_accounts = df1[~(df1.iloc[:, 0].isin(list(df2.iloc[:, 0])))|(~df1.iloc[:, 1].isin(list(df2.iloc[:, 1])))] 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM