[英]How to get pd.Grouper() to include empty groups
I have a dataset that I want to groupby a column AND every month of data in the dataset. 我有一个数据集,我希望按列和每个月的数据集中的数据进行分组。 I'm using
pd.Grouper()
for the groupby date per month part of it. 我正在使用
pd.Grouper()
作为每月的groupby日期的一部分。
df.groupby(['A',pd.Grouper(key='date', freq='M')]).agg({'B':list})
But this returns only the months for each A
, B
that actually have data. 但这只返回实际拥有数据的每个
A
, B
的月份。 I also want every month where there was no data for that A
, B
combo. 我也希望每个月都没有
A
, B
组合的数据。 I don't see this option in the pd.Grouper()
documentation. 我在
pd.Grouper()
文档中没有看到此选项。
Given this DataFrame: 鉴于此DataFrame:
date A B
2018-01-01 1 3
2018-03-01 2 4
After the groupby you can use resample BUT in order to resample unfortunately you need to create the MultiIndex yourself: 在groupby之后你可以使用resample BUT来重新取样, 遗憾的是你需要自己创建MultiIndex:
In [11]: res = df.groupby(['A',pd.Grouper(key='date', freq='M')]).agg({'B':list})
In [12]: m = pd.MultiIndex.from_product([df.A.unique(), pd.date_range(df.date.min(), df.date.max() + pd.offsets.MonthEnd(1), freq='M')])
In [13]: m
Out[13]:
MultiIndex(levels=[[1, 2], [2018-01-31 00:00:00, 2018-02-28 00:00:00, 2018-03-31 00:00:00]],
labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])
In [14]: res.reindex(m)
Out[14]:
B
1 2018-01-31 [3]
2018-02-28 NaN
2018-03-31 NaN
2 2018-01-31 NaN
2018-02-28 NaN
2018-03-31 [4]
Note: to fillna with [] is a little tricky, ideally you'd be able to work around this (in general having lists inside a DataFrame is not recommended). 注意:使用[]填充有点棘手,理想情况下你可以解决这个问题(通常不推荐在DataFrame中使用列表)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.