I have two dataframes. First there is DF1:
ID Other value
1 a
2 b
3 c
and then there is DF2, which is a subset of DF1:
ID Other value
1 a
I want to create a third dataframe that would be the equivalent of a minus in SQL: dropping all the observations in the intersection of the two dataframes. This would leave me with DF3:
ID Other value
2 b
3 c
I've been trying to use pandasql, but it doesn't seem to like my sql. The code is as follows: from pandasql import * import pandas as pd
pysqldf = lambda q: sqldf(q, globals())
train = pysqldf(""" SELECT * FROM DF1 WHERE ID
NOT IN (SELECT ID FROM DF2) """)
I get the error
Error on sql SELECT * FROM DF1 WHERE ID
NOT IN (SELECT ID FROM DF2)
Any ideas on what is going wrong or how I might achieve this quickly using some other pandas functionality. I can do the exact same thing in R with no problems.
这应该做到这一点
df1[df1.ID.isin(df2.ID) == False]
You can subtract Indexs (which is set minus):
In [11]: df1
Out[11]:
Other value
ID
1 a
2 b
3 c
In [12]: df2
Out[12]:
Other value
ID
1 a
In [13]: df1.index - df2.index
Out[13]: Int64Index([2, 3], dtype=int64)
In [14]: df1.loc[df1.index - df2.index] # assuming IDs are unique
Out[14]:
Other value
ID
2 b
3 c
Another option available in 0.13 is to use the isin method:
In [21]: df1.isin(df2)
Out[21]:
Other value
ID
1 True
2 False
3 False
In [22]: df1[~df1.isin(df2).all(1)]
Out[22]:
Other value
ID
2 b
3 c
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.