[英]Computing Set Difference in Pandas between two dataframes
Wondering how to compute set difference in Python's Pandas using two different dataframes. 想知道如何使用两个不同的数据帧计算Python的Pandas中的集合差异。
One dataframe (df1) is of the format: 一个数据帧(df1)的格式为:
State City Population
NY Albany 856654
WV Wheeling 23434
SC Charleston 35323
OH Columbus 343534
WV Charleston 34523
And the second data frame (df2) is 第二个数据帧(df2)是
State City
WV Wheeling
OH Columns
And I need an operation that returns the following data frame 我需要一个返回以下数据框的操作
State City Population
NY Albany 856654
SC Charleston 35323
WV Charleston 34523
Essentially, I can't figure out how to "subtract" df2 from df1 based on the 2 columns (need both since I'll have repeats of city names across different states). 基本上,我无法弄清楚如何根据2列从df1“减去”df2(需要两个因为我将在不同的州重复使用城市名称)。
Do a left join with indicator
which gives information on the origin of each row, then you can filter based on the indicator
: 左边的连接
indicator
提供有关每行原点的信息,然后你可以根据indicator
进行过滤:
df1.merge(df2, indicator=True, how="left")[lambda x: x._merge=='left_only'].drop('_merge',1)
#State City Population
#0 NY Albany 856654
#2 SC Charleston 35323
#4 WV Charleston 34523
过滤器怎么样?
df1[~((df1.City.isin(df2.City)) & (df1.State.isin(df2.State)))]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.