[英]Pandas groupby city and month and fill in missing months
I have a DataFrame with several cities with multiple values for every month. 我有一个DataFrame,其中有几个城市每个月都有多个值。 I need to group those values by city and month, filling missing months with NA.
我需要按城市和月份对这些值进行分组,用NA填补缺失的月份。
Grouping by city and month works: 按城市和月份分组:
self.probes[['city', 'date', 'value']].groupby(['city',pd.Grouper(key='date', freq='M')])
| Munich | 2018-06 | values... |
| Munich | 2018-08 | values... |
| Munich | 2018-09 | values... |
| New York | 2018-06 | values... |
| New York | 2018-07 | values... |
But I can't manage to include missing months. 但我不能设法包括失踪的几个月。
| Munich | 2018-06 | values... |
| Munich |*2018-07*| NA instead of values |
| Munich | 2018-08 | values... |
| Munich | 2018-09 | values... |
| New York | 2018-06 | values... |
| New York | 2018-07 | values... |
I think you need add some aggregate function like sum
first: 我想你需要添加像一些聚合函数
sum
第一:
print (probes)
city date value
0 Munich 2018-06-01 4
1 Munich 2018-08-01 1
2 Munich 2018-08-03 5
3 Munich 2018-09-01 1
4 New York 2018-06-01 1
5 New York 2018-07-01 2
probes['date'] = pd.to_datetime(probes['date'])
s = probes.groupby(['city',pd.Grouper(key='date', freq='M')])['value'].sum()
print (s)
city date
Munich 2018-06-30 4
2018-08-31 6
2018-09-30 1
New York 2018-06-30 1
2018-07-31 2
Name: value, dtype: int64
And then use groupby
by city
with asfreq
, reset_index
is necessary for DatetimeIndex
: 然后在
asfreq
使用groupby
by city
, DatetimeIndex
需要reset_index
:
df1 = (s.reset_index(level=0)
.groupby('city')['value']
.apply(lambda x: x.asfreq('M'))
.reset_index())
print (df1)
city date value
0 Munich 2018-06-30 4.0
1 Munich 2018-07-31 NaN
2 Munich 2018-08-31 6.0
3 Munich 2018-09-30 1.0
4 New York 2018-06-30 1.0
5 New York 2018-07-31 2.0
Also is possible use MS
for start of month: 也可以使用
MS
作为月初:
probes['date'] = pd.to_datetime(probes['date'])
s = probes.groupby(['city',pd.Grouper(key='date', freq='MS')])['value'].sum()
df1 = (s.reset_index(level=0)
.groupby('city')['value']
.apply(lambda x: x.asfreq('MS'))
.reset_index()
)
print (df1)
city date value
0 Munich 2018-06-01 4.0
1 Munich 2018-07-01 NaN
2 Munich 2018-08-01 6.0
3 Munich 2018-09-01 1.0
4 New York 2018-06-01 1.0
5 New York 2018-07-01 2.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.