简体   繁体   English

熊猫组过滤问题

[英]pandas group filter issue

I cannot for the life of me figure out why the filter method refuses to work on my dataframes in pandas. 我不能为我的生活弄清楚为什么过滤器方法拒绝在我的pandas数据帧上工作。

Here is an example showing my issue: 这是一个显示我的问题的示例:

In [99]: dff4
Out[99]: <pandas.core.groupby.DataFrameGroupBy object at 0x1143cbf90>

In [100]: dff3
Out[100]: <pandas.core.groupby.DataFrameGroupBy object at 0x11439a810>

In [101]: dff3.groups
Out[101]: 
{'iphone': [85373, 85374],
 'remote_api_created': [85363,
  85364,
  85365,
  85412]}

In [102]: dff4.groups
Out[102]: {'bye': [3], 'bye bye': [4], 'hello': [0, 1, 2]}

In [103]: dff4.filter(lambda x: len(x) >2)
Out[103]: 
   A      B
0  0  hello
1  1  hello
2  2  hello

In [104]: dff3.filter(lambda x: len(x) >2)
Out[104]: 
Empty DataFrame
Columns: [source]
Index: []

Notice how filter refuses to work on dff3. 请注意过滤器拒绝在dff3上工作。

Any help appreciated. 任何帮助赞赏。

If you group by column name, you move it to index, so your dataframe becomes empty, if no other columns is present, see: 如果按列名分组,则将其移至索引,因此如果不存在其他列,则数据框将变为空,请参阅:

>>> def report(x):
...     print x
...     return True
>>> df
                   source
85363  remote_api_created
85364  remote_api_created
85365  remote_api_created
85373              iphone
85374              iphone
85412  remote_api_created

>>> df.groupby('source').filter(report)
Series([], dtype: float64)
Empty DataFrame
Columns: []
Index: [85373, 85374]
Series([], dtype: float64)
Empty DataFrame
Columns: [source]
Index: []

You can group by column values: 您可以按列值进行分组:

>>> df.groupby(df['source']).filter(lambda x: len(x)>2)
                   source
85363  remote_api_created
85364  remote_api_created
85365  remote_api_created
85412  remote_api_created

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM