简体   繁体   English

通过 object 在组中用“ones”填充缺失的组合

[英]Fill missing combinations with "ones" in a groupby object

I have data in the following format.我有以下格式的数据。

date        group   ret
1986-01-31  1       1.3
1986-01-31  1       0.9
1986-01-31  2       1.4
1986-01-31  2       1.6
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  2       1.3
1986-02-28  2       1.1

I want to get the average return per date and group, which I get by doing:我想获得每个日期和组的平均回报,我可以通过以下方式获得:

output = df.groupby(['date', 'group'])['ret'].mean() + 1 
output = output.reset_index()

Which gives the following output:这给出了以下 output:

date        group   ret
1986-01-31  1       1.1
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  2       1.2

However, since no "ret" was given at date 1986-02-28 for class 1 and 3, there is no row in the output for class 1 and 3 on this date.但是,由于在日期 1986-02-28 没有为 class 1 和 3 提供“ret”,因此 output 中没有针对此日期的 class 的行。 What I would like is that for any combination of dates and classes where no return is given in the original dataframe, this combination gets and output of "1" in the output.我想要的是,对于原始 dataframe 中没有给出返回的日期和类的任何组合,这个组合在 Z78E6221F6393D1356681DB398F14CE6 中得到 output 为“1” So, the required output is:因此,所需的 output 为:

date        group   ret
1986-01-31  1       1.1
1986-01-31  2       1.5
1986-01-31  3       1.1
1986-02-28  1       1
1986-02-28  2       1.2
1986-02-28  3       1

What would be a good solution for this problem?什么是解决这个问题的好方法? Thanks in advance!提前致谢!

We can do pivot_table then stack我们可以先做pivot_table然后stack

out = df.pivot_table(index='date',columns='group',values='ret',aggfunc = 'mean').fillna(1).stack().reset_index(name='value')
         date  group  value
0  1986-01-31      1    1.1
1  1986-01-31      2    1.5
2  1986-01-31      3    1.1
3  1986-02-28      1    1.0
4  1986-02-28      2    1.2
5  1986-02-28      3    1.0

You can reindex the result of the groupby and mean and fill the null values with ones:您可以重新索引groupbymean的结果,并用以下值填充 null 值:

output = df.groupby(['date', 'group'])['ret'].mean().reindex(
    pd.MultiIndex.from_product(
        (pd.date_range(df.date.min(), df.date.max(), freq='M'),
         sorted(df.group.unique())),
        names=['date', 'group'],
    )
).fillna(1).reset_index()

Here the result for the DataFrame in your question:这是您问题中 DataFrame 的结果:

        date  group  ret
0 1986-01-31      1  1.1
1 1986-01-31      2  1.5
2 1986-01-31      3  1.1
3 1986-02-28      1  1.0
4 1986-02-28      2  1.2
5 1986-02-28      3  1.0

You could use the complete function from pyjanitor to expose the explicitly missing values, and fillna with 1 :您可以使用来自pyjanitor完整function 来公开显式缺失的值,并使用1 填充

# pip install pyjanitor
import janitor
(df.groupby(['date', 'group'], as_index = False)
   .ret
   .mean()
   .complete(['date', 'group'])
   .fillna(1)
 )

         date  group  ret
0  1986-01-31      1  1.1
1  1986-01-31      2  1.5
2  1986-01-31      3  1.1
3  1986-02-28      1  1.0
4  1986-02-28      2  1.2
5  1986-02-28      3  1.0

Alternatively, you could convert the group column to a categorical dtype , all categories will be maintained during the groupby:或者,您可以将group列转换为categorical dtype ,在 groupby 期间将维护所有类别:

from pandas.api.types import CategoricalDtype
(df
 .astype({"group": CategoricalDtype(categories=df.group.unique())})
 .groupby(['date', 'group'], as_index = False)
 .ret
 .mean()
 .fillna(1)
 )

         date group  ret
0  1986-01-31     1  1.1
1  1986-01-31     2  1.5
2  1986-01-31     3  1.1
3  1986-02-28     1  1.0
4  1986-02-28     2  1.2
5  1986-02-28     3  1.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM