Fill missing combinations with "ones" in a groupby object

Question

I have data in the following format.

date        group   ret
1986-01-31  1       1.3
1986-01-31  1       0.9
1986-01-31  2       1.4
1986-01-31  2       1.6
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  2       1.3
1986-02-28  2       1.1

I want to get the average return per date and group, which I get by doing:

output = df.groupby(['date', 'group'])['ret'].mean() + 1 
output = output.reset_index()

Which gives the following output:

date        group   ret
1986-01-31  1       1.1
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  2       1.2

However, since no "ret" was given at date 1986-02-28 for class 1 and 3, there is no row in the output for class 1 and 3 on this date. What I would like is that for any combination of dates and classes where no return is given in the original dataframe, this combination gets and output of "1" in the output. So, the required output is:

date        group   ret
1986-01-31  1       1.1
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  1       1
1986-02-28  2       1.2
1986-02-28  3       1

What would be a good solution for this problem? Thanks in advance!

Answer 1

We can do pivot_table then stack

out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')
         date  group  value
0  1986-01-31      1    1.1
1  1986-01-31      2    1.5
2  1986-01-31      3    1.1
3  1986-02-28      1    1.0
4  1986-02-28      2    1.2
5  1986-02-28      3    1.0

Answer 2

You can reindex the result of the groupby and mean and fill the null values with ones:

output = df.groupby(['date', 'group'])['ret'].mean().reindex(
    pd.MultiIndex.from_product(
        (pd.date_range(df.date.min(), df.date.max(), freq='M'),
         sorted(df.group.unique())),
        names=['date', 'group'],
    )
).fillna(1).reset_index()

Here the result for the DataFrame in your question:

        date  group  ret
0 1986-01-31      1  1.1
1 1986-01-31      2  1.5
2 1986-01-31      3  1.1
3 1986-02-28      1  1.0
4 1986-02-28      2  1.2
5 1986-02-28      3  1.0

Answer 3

You could use the complete function from pyjanitor to expose the explicitly missing values, and fillna with 1 :

# pip install pyjanitor
import janitor
(df.groupby(['date', 'group'], as_index = False)
   .ret
   .mean()
   .complete(['date', 'group'])
   .fillna(1)
 )

         date  group  ret
0  1986-01-31      1  1.1
1  1986-01-31      2  1.5
2  1986-01-31      3  1.1
3  1986-02-28      1  1.0
4  1986-02-28      2  1.2
5  1986-02-28      3  1.0

Alternatively, you could convert the group column to a categorical dtype , all categories will be maintained during the groupby:

from pandas.api.types import CategoricalDtype
(df
 .astype({"group": CategoricalDtype(categories=df.group.unique())})
 .groupby(['date', 'group'], as_index = False)
 .ret
 .mean()
 .fillna(1)
 )

         date group  ret
0  1986-01-31     1  1.1
1  1986-01-31     2  1.5
2  1986-01-31     3  1.1
3  1986-02-28     1  1.0
4  1986-02-28     2  1.2
5  1986-02-28     3  1.0

Fill missing combinations with "ones" in a groupby object

Question

3 answers

solution1
4 ACCPTED 2021-04-18 14:03:26

solution2
2 2021-04-18 14:42:51

solution3
2 2021-04-18 15:01:14

Fill missing combinations with "ones" in a groupby object

Question

3 answers

solution1 4 ACCPTED 2021-04-18 14:03:26

solution2 2 2021-04-18 14:42:51

solution3 2 2021-04-18 15:01:14

solution1
4 ACCPTED 2021-04-18 14:03:26

solution2
2 2021-04-18 14:42:51

solution3
2 2021-04-18 15:01:14