[英]Pandas - Conditional resampling on MultiIndex based DataFrame based on a boolean column
I have a df
which has a MultiIndex [(latitude, longitude, time)] with the number of rows being 148 x 244 x 90 x 24. For each latitude and longitude, the time is hourly from 2014-01-01 00:00:00 to 2014:03:31 23:00:00.我有一个
df
,它有一个 MultiIndex [(latitude, longitude, time)],行数为 148 x 244 x 90 x 24。对于每个纬度和经度,时间是从 2014-01-01 00:00 开始的每小时:00 至 2014:03:31 23:00:00。
FFDI isInRange
latitude longitude time
-39.20000 140.80000 2014-01-01 00:00:00 6.20000 True
2014-01-01 01:00:00 4.10000 True
2014-01-01 02:00:00 2.40000 True
2014-01-01 03:00:00 1.90000 True
2014-01-01 04:00:00 1.70000 True
2014-01-01 05:00:00 1.50000 True
2014-01-01 06:00:00 1.40000 True
2014-01-01 07:00:00 1.30000 True
2014-01-01 08:00:00 1.20000 True
2014-01-01 09:00:00 1.00000 True
2014-01-01 10:00:00 1.00000 True
2014-01-01 11:00:00 0.90000 True
2014-01-01 12:00:00 0.90000 True
... ... ... ...
2014-03-31 21:00:00 0.30000 False
2014-03-31 22:00:00 0.30000 False
2014-03-31 23:00:00 0.50000 False
140.83786 2014-01-01 00:00:00 3.20000 True
2014-01-01 01:00:00 2.90000 True
2014-01-01 02:00:00 2.10000 True
2014-01-01 03:00:00 2.90000 True
2014-01-01 04:00:00 1.20000 True
2014-01-01 05:00:00 0.90000 True
2014-01-01 06:00:00 1.10000 True
2014-01-01 07:00:00 1.60000 True
2014-01-01 08:00:00 1.40000 True
2014-01-01 09:00:00 1.50000 True
2014-01-01 10:00:00 1.20000 True
2014-01-01 11:00:00 0.80000 True
2014-01-01 12:00:00 0.40000 True
... ... ... ...
2014-03-31 21:00:00 0.30000 False
2014-03-31 22:00:00 0.30000 False
2014-03-31 23:00:00 0.50000 False
... ... ... ...
... ... ...
-33.90000 140.80000 2014-01-01 00:00:00 6.20000 True
2014-01-01 01:00:00 4.10000 True
2014-01-01 02:00:00 2.40000 True
2014-01-01 03:00:00 1.90000 True
2014-01-01 04:00:00 1.70000 True
2014-01-01 05:00:00 1.50000 True
2014-01-01 06:00:00 1.40000 True
2014-01-01 07:00:00 1.30000 True
2014-01-01 08:00:00 1.20000 True
2014-01-01 09:00:00 1.00000 True
2014-01-01 10:00:00 1.00000 True
2014-01-01 11:00:00 0.90000 True
2014-01-01 12:00:00 0.90000 True
... ... ... ...
2014-03-31 21:00:00 0.30000 False
2014-03-31 22:00:00 0.30000 False
2014-03-31 23:00:00 0.50000 False
140.83786 2014-01-01 00:00:00 3.20000 True
2014-01-01 01:00:00 2.90000 True
2014-01-01 02:00:00 2.10000 True
2014-01-01 03:00:00 2.90000 True
2014-01-01 04:00:00 1.20000 True
2014-01-01 05:00:00 0.90000 True
2014-01-01 06:00:00 1.10000 True
2014-01-01 07:00:00 1.60000 True
2014-01-01 08:00:00 1.40000 True
2014-01-01 09:00:00 1.50000 True
2014-01-01 10:00:00 1.20000 True
2014-01-01 11:00:00 0.80000 True
2014-01-01 12:00:00 0.40000 True
... ... ... ...
2014-03-31 21:00:00 0.30000 False
2014-03-31 22:00:00 0.30000 False
2014-03-31 23:00:00 0.50000 False
78001920 rows × 1 columns
What I want to achieve is to calculate a daily maximum FFDI value for every 24 hours for each latitude and longitude on the condition of:我想要实现的是在以下条件下计算每个纬度和经度每 24 小时的每日最大 FFDI 值:
If isInRange = True for all 24 hours/rows in the group - use FFDI from 13:00:00 of previous day to 12:00:00 of next day
If isInRange = False for all 24 hours/rows in the group - use FFDI from 14:00:00 of previous day to 13:00:00 of next day
Then my code is:然后我的代码是:
df_daily_max = df.groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=13,loffset='11H',label='right',level='time')])['FFDI'].max().reset_index(name='Max FFDI') if df['isInRange'] else isInRange.groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=14,loffset='10H',label='right',level='time')])['FFDI'].max().reset_index(name='Max FFDI')
However this line raised an error:然而,这一行引发了一个错误:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
You can filter first all True
rows and then all False
s rows for aggregate max
, then join by concat
, sorting MultiIndex
and convert to DataFrame
by Series.reset_index
:您可以先过滤所有
True
行,然后过滤所有False
行以获取聚合max
,然后通过concat
连接、排序MultiIndex
并通过Series.reset_index
DataFrame
s1 = df[df['isInRange']].groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=13,loffset='11H',label='right',level='time')])['FFDI'].max()
s2 = df[~df['isInRange']].groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=14,loffset='10H',label='right',level='time')])['FFDI'].max()
df_daily_max = pd.concat([s1, s2]).sort_index().reset_index(name='Max FFDI')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.