简体   繁体   中英

Pandas rolling mean and selective indexing by time

I've a dataset where I've reindexed it with respect to dates (datetime.datetime). A small sample of the dataframe looks like this, df2:

                                lat          lon        Press   NetLW
rounded_dt 1997-11-30 17:00:00  76.15387    -147.62606  998.8   -51.0
           1997-11-30 18:00:00  76.15280    -147.60379  1000.0  -50.9
           1997-11-30 19:00:00  76.15164    -147.58055  1001.1  -54.4
           1997-11-30 20:00:00  76.15037    -147.56047  1002.6  -52.2
           1997-11-30 21:00:00  76.14948    -147.54034  1004.2  -51.9
           1997-11-30 22:00:00  76.14834    -147.52181  1005.5  -51.3
           1997-11-30 23:00:00  76.14777    -147.50568  1006.5  -50.7
           1997-12-01 06:00:00  76.14152    -147.42073  1013.3  -44.6
           1997-12-01 07:00:00  76.14105    -147.41370  1013.8  -45.4
           1997-12-01 08:00:00  76.14072    -147.40661  1014.5  -46.1
           1997-12-01 09:00:00  76.14059    -147.40093  1015.0  -43.0

So the time series is daily per hourly continuing for an year.

What my aim is?

I would like to extract the data based on NetLW for a specific range of days and only for 11 hour and 23 hour for those days. But the NetLW at that hour let's say, 11 hour should be averaged wrt NetLW(10 hour) , NetLW(11 hour) and NetLW(12 hour).

What I've done so far?

df3 = df2.rolling(window=3, center=True).mean() # to get the rolling mean
# I want to extract the dates of interest from df3
dates_list =[]
for idx in df2.index:
    # Winter dates (Dec-March)
    if idx > datetime.datetime(1997, 11, 30, 23) and idx < datetime.datetime(1998, 3, 1, 0): 
       if idx.hour ==11 or idx.hour == 23:
          dates_list.append(df3[df3.loc[idx, 'NetLW'] < -30.0])    

And then I could concatenate the dates_list in one series/dataframe and get the dates

Error message KeyError: True

During handling of the above exception, another exception occurred

And it points to this line:

---> dates_list.append(df3[df3.loc[idx, 'NetLW'] < -30.0])

I'm expecting to use a boolean dtype to use it as an index for df3 and extract the data.

Also, if it is possible to groupby the hour I'm interested in instead of writing multiple loops then please let me know as I'm new to Pandas.

Boolean indexing in a dataframe will generally require you to use the .loc indexer. But what is happening here is that there is only a single index as you are looping. Freely translated: df3_clear = df3[True or False] . I am afraid you do not have a row in your index called True . Neither False .

What you are looking for seems to be along the lines of (can probably be a oneliner, but I'm a bit lazy):

df3_clear = df3['1997-11-30 23:00':'1998-03-01'].query('NetLW < -30')
df3_clear = df3_clear.iloc[np.where((df3_clear.index.hour == 23) | (df3_clear.index.hour == 11))]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM