简体   繁体   English

Python Pandas - 基于先前获取的子集从DataFrame中删除行

[英]Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

I'm running Python 2.7 with the Pandas 0.11.0 library installed. 我正在运行安装了Pandas 0.11.0库的Python 2.7

I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution. 我一直在寻找一个没有找到这个问题的答案,所以我希望有人比我有解决方案更有经验。

Lets say my data, in df1, looks like the following: 让我们说我的数据,在df1中,如下所示:

df1=

  zip  x  y  access
  123  1  1    4
  123  1  1    6
  133  1  2    3
  145  2  2    3
  167  3  1    1
  167  3  1    2

Using, for instance, df2 = df1[df1['zip'] == 123] and then df2 = df2.join(df1[df1['zip'] == 133]) I get the following subset of data: 例如,使用df2 = df1[df1['zip'] == 123]然后df2 = df2.join(df1[df1['zip'] == 133])我得到以下数据子集:

df2=

 zip  x  y  access
 123  1  1    4
 123  1  1    6
 133  1  2    3

What I want to do is either: 我想做的是:

1) Remove the rows from df1 as they are defined/joined with df2 1)从df1删除行,因为它们是用df2定义/连接的

OR 要么

2) After df2 has been created, remove the rows (difference?) from df1 which df2 is composed of 2)之后, df2被创建,从删除行(区别?) df1df2是由

Hope all of that makes sense. 希望所有这一切都有意义。 Please let me know if any more info is needed. 如果需要更多信息,请告诉我。

EDIT: 编辑:

Ideally a third dataframe would be create that looks like this: 理想情况下,第三个数据框将是创建的,如下所示:

df2=

 zip  x  y  access
 145  2  2    3
 167  3  1    1
 167  3  1    2

That is, everything from df1 not in df2 . 也就是说, df1中的所有内容都不在df2 Thanks! 谢谢!

Two options come to mind. 我想到了两种选择。 First, use isin and a mask: 首先,使用isin和一个掩码:

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

Second, use groupby : 其次,使用groupby

>>> grouped = df.groupby(df['zip'].isin(keep))

and then any of 然后任何一个

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]

Which makes most sense depends upon the context, but I think you get the idea. 哪个最有意义取决于上下文,但我认为你明白了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM