简体   繁体   中英

Python Pandas - Removing Rows From A DataFrame Based on a Previously Obtained Subset

I'm running Python 2.7 with the Pandas 0.11.0 library installed.

I've been looking around a haven't found an answer to this question, so I'm hoping somebody more experienced than I has a solution.

Lets say my data, in df1, looks like the following:

df1=

  zip  x  y  access
  123  1  1    4
  123  1  1    6
  133  1  2    3
  145  2  2    3
  167  3  1    1
  167  3  1    2

Using, for instance, df2 = df1[df1['zip'] == 123] and then df2 = df2.join(df1[df1['zip'] == 133]) I get the following subset of data:

df2=

 zip  x  y  access
 123  1  1    4
 123  1  1    6
 133  1  2    3

What I want to do is either:

1) Remove the rows from df1 as they are defined/joined with df2

OR

2) After df2 has been created, remove the rows (difference?) from df1 which df2 is composed of

Hope all of that makes sense. Please let me know if any more info is needed.

EDIT:

Ideally a third dataframe would be create that looks like this:

df2=

 zip  x  y  access
 145  2  2    3
 167  3  1    1
 167  3  1    2

That is, everything from df1 not in df2 . Thanks!

Two options come to mind. First, use isin and a mask:

>>> df
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> keep = [123, 133]
>>> df_yes = df[df['zip'].isin(keep)]
>>> df_no = df[~df['zip'].isin(keep)]
>>> df_yes
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> df_no
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2

Second, use groupby :

>>> grouped = df.groupby(df['zip'].isin(keep))

and then any of

>>> grouped.get_group(True)
   zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3
>>> grouped.get_group(False)
   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2
>>> [g for k,g in list(grouped)]
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]
>>> dict(list(grouped))
{False:    zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2, True:    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3}
>>> dict(list(grouped)).values()
[   zip  x  y  access
3  145  2  2       3
4  167  3  1       1
5  167  3  1       2,    zip  x  y  access
0  123  1  1       4
1  123  1  1       6
2  133  1  2       3]

Which makes most sense depends upon the context, but I think you get the idea.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM