Pandas groupby dropna=False does not work for apply

Question

Suppose I have the following dataframe.

df = pd.DataFrame({'a':[None,None,None], 'b':[1,1,2], 'c': [1,1,3], 'd': [1,1,1]})

df.groupby(['a', 'b', 'c'], dropna=True).d.sum()
=> Series([], Name: d, dtype: int64)

df.groupby(['a', 'b', 'c'], dropna=False).d.sum()
=> a    b  c
   NaN  1  1    2
        2  3    1
   Name: d, dtype: int64

The output is as expected on dropna flag.

Now, I define a custom function to apply.

def _is_outlier(s):
    lower_limit = s.mean() - (s.std() * 2)
    upper_limit = s.mean() + (s.std() * 2)
    return ~s.between(lower_limit, upper_limit)

df.groupby(['a', 'b', 'c'], dropna=False).d.apply(_is_outlier)
=> Series([], Name: d, dtype: bool)

df.groupby(['a', 'b', 'c'], dropna=True).d.apply(_is_outlier)
=> Series([], Name: d, dtype: bool)

Both returns empty series. It looks like dropna does not work as expected for apply function.
Does anybody know a workaround for this issue?

Thanks,

Answer 1

It looks like it is a bug fixed in version 1.3.3. From the release notes :

Fixed regression in GroupBy.apply() where nan values were dropped even with dropna=False (GH43205)

Can you try to update pandas and check if you still have this issue?

Answer 2

它会解决它使其成为df然后重置索引吗？

pd.DataFrame(df.groupby(['a', 'b', 'c'], dropna=False).d.sum()).reset_index()

Pandas groupby dropna=False does not work for apply

Question

2 answers

solution1
1 2021-11-03 08:25:20

solution2
0 2021-11-03 08:43:33

Pandas groupby dropna=False does not work for apply

Question

2 answers

solution1 1 2021-11-03 08:25:20

solution2 0 2021-11-03 08:43:33

solution1
1 2021-11-03 08:25:20

solution2
0 2021-11-03 08:43:33