[英]Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset
I'm running Python 2.7
with the Pandas 0.11.0
library installed. 我正在运行安装了
Pandas 0.11.0
库的Python 2.7
。
I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution. 我一直在寻找一个没有找到这个问题的答案,所以我希望有人比我有解决方案更有经验。
Lets say my data, in df1, looks like the following: 让我们说我的数据,在df1中,如下所示:
df1=
zip x y access
123 1 1 4
123 1 1 6
133 1 2 3
145 2 2 3
167 3 1 1
167 3 1 2
Using, for instance, df2 = df1[df1['zip'] == 123]
and then df2 = df2.join(df1[df1['zip'] == 133])
I get the following subset of data: 例如,使用
df2 = df1[df1['zip'] == 123]
然后df2 = df2.join(df1[df1['zip'] == 133])
我得到以下数据子集:
df2=
zip x y access
123 1 1 4
123 1 1 6
133 1 2 3
What I want to do is either: 我想做的是:
1) Remove the rows from df1
as they are defined/joined with df2
1)从
df1
删除行,因为它们是用df2
定义/连接的
OR 要么
2) After df2
has been created, remove the rows (difference?) from df1
which df2
is composed of 2)之后,
df2
被创建,从删除行(区别?) df1
其df2
是由
Hope all of that makes sense. 希望所有这一切都有意义。 Please let me know if any more info is needed.
如果需要更多信息,请告诉我。
EDIT: 编辑:
Ideally a third dataframe would be create that looks like this: 理想情况下,第三个数据框将是创建的,如下所示:
df2=
zip x y access
145 2 2 3
167 3 1 1
167 3 1 2
That is, everything from df1
not in df2
. 也就是说,
df1
中的所有内容都不在df2
。 Thanks! 谢谢!
Two options come to mind. 我想到了两种选择。 First, use
isin
and a mask: 首先,使用
isin
和一个掩码:
>>> df
zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3
>>> df_no
zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2
Second, use groupby
: 其次,使用
groupby
:
>>> grouped = df.groupby(df['zip'].isin(keep))
and then any of 然后任何一个
>>> grouped.get_group(True)
zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3
>>> grouped.get_group(False)
zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2
>>> [g for k,g in list(grouped)]
[ zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2, zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3]
>>> dict(list(grouped))
{False: zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2, True: zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3}
>>> dict(list(grouped)).values()
[ zip x y access
3 145 2 2 3
4 167 3 1 1
5 167 3 1 2, zip x y access
0 123 1 1 4
1 123 1 1 6
2 133 1 2 3]
Which makes most sense depends upon the context, but I think you get the idea. 哪个最有意义取决于上下文,但我认为你明白了。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.