计算两个数据帧之间的Pandas差异

Question

Wondering how to compute set difference in Python's Pandas using two different dataframes. 想知道如何使用两个不同的数据帧计算Python的Pandas中的集合差异。

One dataframe (df1) is of the format: 一个数据帧（df1）的格式为：

State  City          Population
NY     Albany        856654
WV     Wheeling      23434
SC     Charleston    35323
OH     Columbus      343534
WV     Charleston    34523

And the second data frame (df2) is 第二个数据帧（df2）是

State  City
WV     Wheeling
OH     Columns

And I need an operation that returns the following data frame 我需要一个返回以下数据框的操作

State   City        Population
NY      Albany      856654
SC      Charleston  35323
WV      Charleston  34523

Essentially, I can't figure out how to "subtract" df2 from df1 based on the 2 columns (need both since I'll have repeats of city names across different states). 基本上，我无法弄清楚如何根据2列从df1“减去”df2（需要两个因为我将在不同的州重复使用城市名称）。

Answer 1

Do a left join with indicator which gives information on the origin of each row, then you can filter based on the indicator : 左边的连接indicator提供有关每行原点的信息，然后你可以根据indicator进行过滤：

df1.merge(df2, indicator=True, how="left")[lambda x: x._merge=='left_only'].drop('_merge',1)

#State       City   Population
#0  NY      Albany      856654
#2  SC  Charleston       35323
#4  WV  Charleston       34523

Answer 2

过滤器怎么样？

df1[~((df1.City.isin(df2.City)) & (df1.State.isin(df2.State)))]

计算两个数据帧之间的Pandas差异

问题描述

2 个解决方案

解决方案1
7 已采纳 2017-02-23 20:37:54

解决方案2
1 2017-02-23 22:33:05

计算两个数据帧之间的Pandas差异

问题描述

2 个解决方案

解决方案1 7 已采纳 2017-02-23 20:37:54

解决方案2 1 2017-02-23 22:33:05

解决方案1
7 已采纳 2017-02-23 20:37:54

解决方案2
1 2017-02-23 22:33:05