Pandas performing a SQL subtraction between two dataframes

Question

I have two dataframes. First there is DF1:

ID      Other value
1           a
2           b
3           c

and then there is DF2, which is a subset of DF1:

ID      Other value
1           a

I want to create a third dataframe that would be the equivalent of a minus in SQL: dropping all the observations in the intersection of the two dataframes. This would leave me with DF3:

ID      Other value
2           b
3           c

I've been trying to use pandasql, but it doesn't seem to like my sql. The code is as follows: from pandasql import * import pandas as pd

pysqldf = lambda q: sqldf(q, globals())
train = pysqldf(""" SELECT * FROM DF1 WHERE ID 
                           NOT IN (SELECT ID FROM DF2) """)

I get the error

Error on sql  SELECT * FROM DF1 WHERE ID 
                           NOT IN (SELECT ID FROM DF2)

Any ideas on what is going wrong or how I might achieve this quickly using some other pandas functionality. I can do the exact same thing in R with no problems.

Answer 1

这应该做到这一点

df1[df1.ID.isin(df2.ID) == False]

Answer 2

You can subtract Indexs (which is set minus):

In [11]: df1
Out[11]: 
   Other value
ID            
1            a
2            b
3            c

In [12]: df2
Out[12]: 
   Other value
ID            
1            a

In [13]: df1.index - df2.index
Out[13]: Int64Index([2, 3], dtype=int64)

In [14]: df1.loc[df1.index - df2.index]  # assuming IDs are unique
Out[14]: 
   Other value
ID            
2            b
3            c

Another option available in 0.13 is to use the isin method:

In [21]: df1.isin(df2)
Out[21]: 
   Other value
ID            
1         True
2        False
3        False

In [22]: df1[~df1.isin(df2).all(1)]
Out[22]: 
   Other value
ID            
2            b
3            c

Pandas performing a SQL subtraction between two dataframes

Question

2 answers

solution1
5 ACCPTED 2013-11-18 23:37:30

solution2
2 2013-11-18 23:37:36

Pandas performing a SQL subtraction between two dataframes

Question

2 answers

solution1 5 ACCPTED 2013-11-18 23:37:30

solution2 2 2013-11-18 23:37:36

solution1
5 ACCPTED 2013-11-18 23:37:30

solution2
2 2013-11-18 23:37:36