简体   繁体   中英

pandas group filter issue

I cannot for the life of me figure out why the filter method refuses to work on my dataframes in pandas.

Here is an example showing my issue:

In [99]: dff4
Out[99]: <pandas.core.groupby.DataFrameGroupBy object at 0x1143cbf90>

In [100]: dff3
Out[100]: <pandas.core.groupby.DataFrameGroupBy object at 0x11439a810>

In [101]: dff3.groups
Out[101]: 
{'iphone': [85373, 85374],
 'remote_api_created': [85363,
  85364,
  85365,
  85412]}

In [102]: dff4.groups
Out[102]: {'bye': [3], 'bye bye': [4], 'hello': [0, 1, 2]}

In [103]: dff4.filter(lambda x: len(x) >2)
Out[103]: 
   A      B
0  0  hello
1  1  hello
2  2  hello

In [104]: dff3.filter(lambda x: len(x) >2)
Out[104]: 
Empty DataFrame
Columns: [source]
Index: []

Notice how filter refuses to work on dff3.

Any help appreciated.

If you group by column name, you move it to index, so your dataframe becomes empty, if no other columns is present, see:

>>> def report(x):
...     print x
...     return True
>>> df
                   source
85363  remote_api_created
85364  remote_api_created
85365  remote_api_created
85373              iphone
85374              iphone
85412  remote_api_created

>>> df.groupby('source').filter(report)
Series([], dtype: float64)
Empty DataFrame
Columns: []
Index: [85373, 85374]
Series([], dtype: float64)
Empty DataFrame
Columns: [source]
Index: []

You can group by column values:

>>> df.groupby(df['source']).filter(lambda x: len(x)>2)
                   source
85363  remote_api_created
85364  remote_api_created
85365  remote_api_created
85412  remote_api_created

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM