简体   繁体   中英

Pandas - Conditional resampling on MultiIndex based DataFrame based on a boolean column

I have a df which has a MultiIndex [(latitude, longitude, time)] with the number of rows being 148 x 244 x 90 x 24. For each latitude and longitude, the time is hourly from 2014-01-01 00:00:00 to 2014:03:31 23:00:00.

                                                FFDI         isInRange
latitude    longitude   time    
-39.20000   140.80000   2014-01-01 00:00:00     6.20000      True
                        2014-01-01 01:00:00     4.10000      True
                        2014-01-01 02:00:00     2.40000      True
                        2014-01-01 03:00:00     1.90000      True
                        2014-01-01 04:00:00     1.70000      True
                        2014-01-01 05:00:00     1.50000      True
                        2014-01-01 06:00:00     1.40000      True
                        2014-01-01 07:00:00     1.30000      True
                        2014-01-01 08:00:00     1.20000      True
                        2014-01-01 09:00:00     1.00000      True
                        2014-01-01 10:00:00     1.00000      True
                        2014-01-01 11:00:00     0.90000      True
                        2014-01-01 12:00:00     0.90000      True
                        ... ... ... ...
                        2014-03-31 21:00:00     0.30000      False
                        2014-03-31 22:00:00     0.30000      False
                        2014-03-31 23:00:00     0.50000      False
            140.83786   2014-01-01 00:00:00     3.20000      True
                        2014-01-01 01:00:00     2.90000      True
                        2014-01-01 02:00:00     2.10000      True
                        2014-01-01 03:00:00     2.90000      True
                        2014-01-01 04:00:00     1.20000      True
                        2014-01-01 05:00:00     0.90000      True
                        2014-01-01 06:00:00     1.10000      True
                        2014-01-01 07:00:00     1.60000      True
                        2014-01-01 08:00:00     1.40000      True
                        2014-01-01 09:00:00     1.50000      True
                        2014-01-01 10:00:00     1.20000      True
                        2014-01-01 11:00:00     0.80000      True
                        2014-01-01 12:00:00     0.40000      True
                        ... ... ... ...
                        2014-03-31 21:00:00     0.30000      False
                        2014-03-31 22:00:00     0.30000      False
                        2014-03-31 23:00:00     0.50000      False
            ... ... ... ...
... ... ...
-33.90000   140.80000   2014-01-01 00:00:00     6.20000      True
                        2014-01-01 01:00:00     4.10000      True
                        2014-01-01 02:00:00     2.40000      True
                        2014-01-01 03:00:00     1.90000      True
                        2014-01-01 04:00:00     1.70000      True
                        2014-01-01 05:00:00     1.50000      True
                        2014-01-01 06:00:00     1.40000      True
                        2014-01-01 07:00:00     1.30000      True
                        2014-01-01 08:00:00     1.20000      True
                        2014-01-01 09:00:00     1.00000      True
                        2014-01-01 10:00:00     1.00000      True
                        2014-01-01 11:00:00     0.90000      True
                        2014-01-01 12:00:00     0.90000      True
                        ... ... ... ...
                        2014-03-31 21:00:00     0.30000      False
                        2014-03-31 22:00:00     0.30000      False
                        2014-03-31 23:00:00     0.50000      False
            140.83786   2014-01-01 00:00:00     3.20000      True
                        2014-01-01 01:00:00     2.90000      True
                        2014-01-01 02:00:00     2.10000      True
                        2014-01-01 03:00:00     2.90000      True
                        2014-01-01 04:00:00     1.20000      True
                        2014-01-01 05:00:00     0.90000      True
                        2014-01-01 06:00:00     1.10000      True
                        2014-01-01 07:00:00     1.60000      True
                        2014-01-01 08:00:00     1.40000      True
                        2014-01-01 09:00:00     1.50000      True
                        2014-01-01 10:00:00     1.20000      True
                        2014-01-01 11:00:00     0.80000      True
                        2014-01-01 12:00:00     0.40000      True
                        ... ... ... ...
                        2014-03-31 21:00:00     0.30000      False
                        2014-03-31 22:00:00     0.30000      False
                        2014-03-31 23:00:00     0.50000      False

78001920 rows × 1 columns

What I want to achieve is to calculate a daily maximum FFDI value for every 24 hours for each latitude and longitude on the condition of:

If isInRange = True  for all 24 hours/rows in the group - use FFDI from 13:00:00 of previous day to 12:00:00 of next day
If isInRange = False for all 24 hours/rows in the group - use FFDI from 14:00:00 of previous day to 13:00:00 of next day

Then my code is:

df_daily_max = df.groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=13,loffset='11H',label='right',level='time')])['FFDI'].max().reset_index(name='Max FFDI') if df['isInRange'] else isInRange.groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=14,loffset='10H',label='right',level='time')])['FFDI'].max().reset_index(name='Max FFDI')

However this line raised an error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

You can filter first all True rows and then all False s rows for aggregate max , then join by concat , sorting MultiIndex and convert to DataFrame by Series.reset_index :

s1 = df[df['isInRange']].groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=13,loffset='11H',label='right',level='time')])['FFDI'].max()

s2 = df[~df['isInRange']].groupby(['latitude', 'longitude', pd.Grouper(freq='24H',base=14,loffset='10H',label='right',level='time')])['FFDI'].max()

df_daily_max = pd.concat([s1, s2]).sort_index().reset_index(name='Max FFDI')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM