大熊猫城市和月份，并填补失踪的几个月

Question

I have a DataFrame with several cities with multiple values for every month. 我有一个DataFrame，其中有几个城市每个月都有多个值。 I need to group those values by city and month, filling missing months with NA. 我需要按城市和月份对这些值进行分组，用NA填补缺失的月份。

Grouping by city and month works: 按城市和月份分组：

self.probes[['city', 'date', 'value']].groupby(['city',pd.Grouper(key='date', freq='M')])

| Munich   | 2018-06 | values... |
| Munich   | 2018-08 | values... |
| Munich   | 2018-09 | values... |
| New York | 2018-06 | values... |
| New York | 2018-07 | values... |

But I can't manage to include missing months. 但我不能设法包括失踪的几个月。

| Munich   | 2018-06 | values... |
| Munich   |*2018-07*| NA instead of values |
| Munich   | 2018-08 | values... |
| Munich   | 2018-09 | values... |
| New York | 2018-06 | values... |
| New York | 2018-07 | values... |

Answer 1

I think you need add some aggregate function like sum first: 我想你需要添加像一些聚合函数sum第一：

print (probes)
       city        date  value
0    Munich  2018-06-01      4
1    Munich  2018-08-01      1
2    Munich  2018-08-03      5
3    Munich  2018-09-01      1
4  New York  2018-06-01      1
5  New York  2018-07-01      2

probes['date'] = pd.to_datetime(probes['date'])
s = probes.groupby(['city',pd.Grouper(key='date', freq='M')])['value'].sum()
print (s)
city      date      
Munich    2018-06-30    4
          2018-08-31    6
          2018-09-30    1
New York  2018-06-30    1
          2018-07-31    2
Name: value, dtype: int64

And then use groupby by city with asfreq , reset_index is necessary for DatetimeIndex : 然后在asfreq使用groupby by city ， DatetimeIndex需要reset_index ：

df1 = (s.reset_index(level=0)
        .groupby('city')['value']
        .apply(lambda x: x.asfreq('M'))
        .reset_index())
print (df1)
       city       date  value
0    Munich 2018-06-30    4.0
1    Munich 2018-07-31    NaN
2    Munich 2018-08-31    6.0
3    Munich 2018-09-30    1.0
4  New York 2018-06-30    1.0
5  New York 2018-07-31    2.0

Also is possible use MS for start of month: 也可以使用MS作为月初：

probes['date'] = pd.to_datetime(probes['date'])
s = probes.groupby(['city',pd.Grouper(key='date', freq='MS')])['value'].sum()

df1 = (s.reset_index(level=0)
        .groupby('city')['value']
        .apply(lambda x: x.asfreq('MS'))
        .reset_index()
        )
print (df1)
       city       date  value
0    Munich 2018-06-01    4.0
1    Munich 2018-07-01    NaN
2    Munich 2018-08-01    6.0
3    Munich 2018-09-01    1.0
4  New York 2018-06-01    1.0
5  New York 2018-07-01    2.0

大熊猫城市和月份，并填补失踪的几个月

问题描述

1 个解决方案

解决方案1
7 2018-11-08 14:07:22

大熊猫城市和月份，并填补失踪的几个月

问题描述

1 个解决方案

解决方案1 7 2018-11-08 14:07:22

解决方案1
7 2018-11-08 14:07:22