简体   繁体   English

如何让pd.Grouper()包含空组

[英]How to get pd.Grouper() to include empty groups

I have a dataset that I want to groupby a column AND every month of data in the dataset. 我有一个数据集,我希望按列和每个月的数据集中的数据进行分组。 I'm using pd.Grouper() for the groupby date per month part of it. 我正在使用pd.Grouper()作为每月的groupby日期的一部分。

df.groupby(['A',pd.Grouper(key='date', freq='M')]).agg({'B':list})

But this returns only the months for each A , B that actually have data. 但这只返回实际拥有数据的每个AB的月份。 I also want every month where there was no data for that A , B combo. 我也希望每个月都没有AB组合的数据。 I don't see this option in the pd.Grouper() documentation. 我在pd.Grouper()文档中没有看到此选项。

Given this DataFrame: 鉴于此DataFrame:

date        A  B
2018-01-01  1  3
2018-03-01  2  4

After the groupby you can use resample BUT in order to resample unfortunately you need to create the MultiIndex yourself: 在groupby之后你可以使用resample BUT来重新取样, 遗憾的是你需要自己创建MultiIndex:

In [11]: res = df.groupby(['A',pd.Grouper(key='date', freq='M')]).agg({'B':list})

In [12]: m = pd.MultiIndex.from_product([df.A.unique(), pd.date_range(df.date.min(), df.date.max() + pd.offsets.MonthEnd(1), freq='M')])

In [13]: m
Out[13]:
MultiIndex(levels=[[1, 2], [2018-01-31 00:00:00, 2018-02-28 00:00:00, 2018-03-31 00:00:00]],
           labels=[[0, 0, 0, 1, 1, 1], [0, 1, 2, 0, 1, 2]])

In [14]: res.reindex(m)
Out[14]:
                B
1 2018-01-31  [3]
  2018-02-28  NaN
  2018-03-31  NaN
2 2018-01-31  NaN
  2018-02-28  NaN
  2018-03-31  [4]

Note: to fillna with [] is a little tricky, ideally you'd be able to work around this (in general having lists inside a DataFrame is not recommended). 注意:使用[]填充有点棘手,理想情况下你可以解决这个问题(通常不推荐在DataFrame中使用列表)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM