简体   繁体   English

如何基于具有不同行数的另一个 Dataframe 中的一个相似列删除一个 DataFrame 中的行

[英]How to drop rows in one DataFrame based on one similar column in another Dataframe that has a different number of rows

I have two DataFrames that are completely dissimilar except for certain values in one particular column:我有两个完全不同的 DataFrame,除了一个特定列中的某些值:

df
  First  Last   Email             Age
0 Adam   Smith  email1@email.com   30
1 John   Brown  email2@email.com  35
2 Joe    Max    email3@email.com  40
3 Will   Bill   email4@email.com  25
4 Johnny Jacks  email5@email.com  50
df2
  ID   Location  Contact
0 5435 Austin    email5@email.com
1 4234 Atlanta   email1@email.com
2 7896 Toronto   email3@email.com

How would I go about finding the matching values in the Email column of df and the Contact column of df2, and then dropping the whole row in df based on that match?我将如何 go 关于在 df 的 Email 列和 df2 的 Contact 列中找到匹配值,然后根据该匹配删除 df 中的整行?

Output I'm looking for (index numbering doesn't matter): Output 我正在寻找(索引编号无关紧要):

df1
  First  Last   Email             Age
1 John   Brown  email2@email.com  35
3 Will   Bill   email4@email.com  25

I've been able to identify matches using a few different methods like:我已经能够使用几种不同的方法来识别匹配项,例如:

Changing the column names to be identical将列名更改为相同

common = df.merge(df2,on=['Email'])
df3 = df[(~df['Email'].isin(common['Email']))]

But df3 still shows all the rows from df.但是 df3 仍然显示来自 df 的所有行。

I've also tried:我也试过:

common = df['Email'].isin(df2['Contact'])
df.drop(df[common].index, inplace = True)

And again, identifies the matches but df still contains all original rows.再次识别匹配项,但 df 仍包含所有原始行。

So the main thing I'm having difficulty with is updating df with the matches dropped or creating a new DataFrame that contains only the rows with dissimilar values when comparing the Email column from df and the Contact column in df2.因此,我遇到的主要困难是使用删除的匹配项更新 df 或创建一个新的 DataFrame ,在比较 df 中的 Email 列和 df2 中的 Contact 列时,它只包含具有不同值的行。 Appreciate any suggestions.感谢任何建议。

As mentioned in the comments(@Arkadiusz), it is enough to filter your data using the following如评论(@Arkadiusz)中所述,使用以下内容过滤您的数据就足够了

df3 = df[(~df['Email'].isin(df2.Contact))].copy()
print(df3)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM