I cannot for the life of me figure out why the filter method refuses to work on my dataframes in pandas.
Here is an example showing my issue:
In [99]: dff4
Out[99]: <pandas.core.groupby.DataFrameGroupBy object at 0x1143cbf90>
In [100]: dff3
Out[100]: <pandas.core.groupby.DataFrameGroupBy object at 0x11439a810>
In [101]: dff3.groups
Out[101]:
{'iphone': [85373, 85374],
'remote_api_created': [85363,
85364,
85365,
85412]}
In [102]: dff4.groups
Out[102]: {'bye': [3], 'bye bye': [4], 'hello': [0, 1, 2]}
In [103]: dff4.filter(lambda x: len(x) >2)
Out[103]:
A B
0 0 hello
1 1 hello
2 2 hello
In [104]: dff3.filter(lambda x: len(x) >2)
Out[104]:
Empty DataFrame
Columns: [source]
Index: []
Notice how filter refuses to work on dff3.
Any help appreciated.
If you group by column name, you move it to index, so your dataframe becomes empty, if no other columns is present, see:
>>> def report(x):
... print x
... return True
>>> df
source
85363 remote_api_created
85364 remote_api_created
85365 remote_api_created
85373 iphone
85374 iphone
85412 remote_api_created
>>> df.groupby('source').filter(report)
Series([], dtype: float64)
Empty DataFrame
Columns: []
Index: [85373, 85374]
Series([], dtype: float64)
Empty DataFrame
Columns: [source]
Index: []
You can group by column values:
>>> df.groupby(df['source']).filter(lambda x: len(x)>2)
source
85363 remote_api_created
85364 remote_api_created
85365 remote_api_created
85412 remote_api_created
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.