简体   繁体   English

Pandas - 基于 MultiIndex 的条件重采样 DataFrame 基于 boolean 列

[英]Pandas - Conditional resampling on MultiIndex based DataFrame based on a boolean column

I have a df which has a MultiIndex [(latitude, longitude, time)] with the number of rows being 148 x 244 x 90 x 24. For each latitude and longitude, the time is hourly from 2014-01-01 00:00:00 to 2014:03:31 23:00:00.我有一个df ,它有一个 MultiIndex [(latitude, longitude, time)],行数为 148 x 244 x 90 x 24。对于每个纬度和经度,时间是从 2014-01-01 00:00 开始的每小时:00 至 2014:03:31 23:00:00。

                                                FFDI         isInRange
latitude    longitude   time    
-39.20000   140.80000   2014-01-01 00:00:00     6.20000      True
                        2014-01-01 01:00:00     4.10000      True
                        2014-01-01 02:00:00     2.40000      True
                        2014-01-01 03:00:00     1.90000      True
                        2014-01-01 04:00:00     1.70000      True
                        2014-01-01 05:00:00     1.50000      True
                        2014-01-01 06:00:00     1.40000      True
                        2014-01-01 07:00:00     1.30000      True
                        2014-01-01 08:00:00     1.20000      True
                        2014-01-01 09:00:00     1.00000      True
                        2014-01-01 10:00:00     1.00000      True
                        2014-01-01 11:00:00     0.90000      True
                        2014-01-01 12:00:00     0.90000      True
                        ... ... ... ...
                        2014-03-31 21:00:00     0.30000      False
                        2014-03-31 22:00:00     0.30000      False
                        2014-03-31 23:00:00     0.50000      False
            140.83786   2014-01-01 00:00:00     3.20000      True
                        2014-01-01 01:00:00     2.90000      True
                        2014-01-01 02:00:00     2.10000      True
                        2014-01-01 03:00:00     2.90000      True
                        2014-01-01 04:00:00     1.20000      True
                        2014-01-01 05:00:00     0.90000      True
                        2014-01-01 06:00:00     1.10000      True
                        2014-01-01 07:00:00     1.60000      True
                        2014-01-01 08:00:00     1.40000      True
                        2014-01-01 09:00:00     1.50000      True
                        2014-01-01 10:00:00     1.20000      True
                        2014-01-01 11:00:00     0.80000      True
                        2014-01-01 12:00:00     0.40000      True
                        ... ... ... ...
                        2014-03-31 21:00:00     0.30000      False
                        2014-03-31 22:00:00     0.30000      False
                        2014-03-31 23:00:00     0.50000      False
            ... ... ... ...
... ... ...
-33.90000   140.80000   2014-01-01 00:00:00     6.20000      True
                        2014-01-01 01:00:00     4.10000      True
                        2014-01-01 02:00:00     2.40000      True
                        2014-01-01 03:00:00     1.90000      True
                        2014-01-01 04:00:00     1.70000      True
                        2014-01-01 05:00:00     1.50000      True
                        2014-01-01 06:00:00     1.40000      True
                        2014-01-01 07:00:00     1.30000      True
                        2014-01-01 08:00:00     1.20000      True
                        2014-01-01 09:00:00     1.00000      True
                        2014-01-01 10:00:00     1.00000      True
                        2014-01-01 11:00:00     0.90000      True
                        2014-01-01 12:00:00     0.90000      True
                        ... ... ... ...
                        2014-03-31 21:00:00     0.30000      False
                        2014-03-31 22:00:00     0.30000      False
                        2014-03-31 23:00:00     0.50000      False
            140.83786   2014-01-01 00:00:00     3.20000      True
                        2014-01-01 01:00:00     2.90000      True
                        2014-01-01 02:00:00     2.10000      True
                        2014-01-01 03:00:00     2.90000      True
                        2014-01-01 04:00:00     1.20000      True
                        2014-01-01 05:00:00     0.90000      True
                        2014-01-01 06:00:00     1.10000      True
                        2014-01-01 07:00:00     1.60000      True
                        2014-01-01 08:00:00     1.40000      True
                        2014-01-01 09:00:00     1.50000      True
                        2014-01-01 10:00:00     1.20000      True
                        2014-01-01 11:00:00     0.80000      True
                        2014-01-01 12:00:00     0.40000      True
                        ... ... ... ...
                        2014-03-31 21:00:00     0.30000      False
                        2014-03-31 22:00:00     0.30000      False
                        2014-03-31 23:00:00     0.50000      False

78001920 rows × 1 columns

What I want to achieve is to calculate a daily maximum FFDI value for every 24 hours for each latitude and longitude on the condition of:我想要实现的是在以下条件下计算每个纬度和经度每 24 小时的每日最大 FFDI 值:

If isInRange = True  for all 24 hours/rows in the group - use FFDI from 13:00:00 of previous day to 12:00:00 of next day
If isInRange = False for all 24 hours/rows in the group - use FFDI from 14:00:00 of previous day to 13:00:00 of next day

Then my code is:然后我的代码是:

df_daily_max = df.groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=13,loffset='11H',label='right',level='time')])['FFDI'].max().reset_index(name='Max FFDI') if df['isInRange'] else isInRange.groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=14,loffset='10H',label='right',level='time')])['FFDI'].max().reset_index(name='Max FFDI')

However this line raised an error:然而,这一行引发了一个错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You can filter first all True rows and then all False s rows for aggregate max , then join by concat , sorting MultiIndex and convert to DataFrame by Series.reset_index :您可以先过滤所有True行,然后过滤所有False行以获取聚合max ,然后通过concat连接、排序MultiIndex并通过Series.reset_index DataFrame

s1 = df[df['isInRange']].groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=13,loffset='11H',label='right',level='time')])['FFDI'].max()

s2 = df[~df['isInRange']].groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=14,loffset='10H',label='right',level='time')])['FFDI'].max()

df_daily_max = pd.concat([s1, s2]).sort_index().reset_index(name='Max FFDI')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM