I've a dataset where I've reindexed it with respect to dates (datetime.datetime). A small sample of the dataframe looks like this, df2:
lat lon Press NetLW
rounded_dt 1997-11-30 17:00:00 76.15387 -147.62606 998.8 -51.0
1997-11-30 18:00:00 76.15280 -147.60379 1000.0 -50.9
1997-11-30 19:00:00 76.15164 -147.58055 1001.1 -54.4
1997-11-30 20:00:00 76.15037 -147.56047 1002.6 -52.2
1997-11-30 21:00:00 76.14948 -147.54034 1004.2 -51.9
1997-11-30 22:00:00 76.14834 -147.52181 1005.5 -51.3
1997-11-30 23:00:00 76.14777 -147.50568 1006.5 -50.7
1997-12-01 06:00:00 76.14152 -147.42073 1013.3 -44.6
1997-12-01 07:00:00 76.14105 -147.41370 1013.8 -45.4
1997-12-01 08:00:00 76.14072 -147.40661 1014.5 -46.1
1997-12-01 09:00:00 76.14059 -147.40093 1015.0 -43.0
So the time series is daily per hourly continuing for an year.
What my aim is?
I would like to extract the data based on NetLW for a specific range of days and only for 11 hour and 23 hour for those days. But the NetLW at that hour let's say, 11 hour should be averaged wrt NetLW(10 hour) , NetLW(11 hour) and NetLW(12 hour).
What I've done so far?
df3 = df2.rolling(window=3, center=True).mean() # to get the rolling mean
# I want to extract the dates of interest from df3
dates_list =[]
for idx in df2.index:
# Winter dates (Dec-March)
if idx > datetime.datetime(1997, 11, 30, 23) and idx < datetime.datetime(1998, 3, 1, 0):
if idx.hour ==11 or idx.hour == 23:
dates_list.append(df3[df3.loc[idx, 'NetLW'] < -30.0])
And then I could concatenate the dates_list in one series/dataframe and get the dates
Error message KeyError: True
During handling of the above exception, another exception occurred
And it points to this line:
---> dates_list.append(df3[df3.loc[idx, 'NetLW'] < -30.0])
I'm expecting to use a boolean dtype to use it as an index for df3 and extract the data.
Also, if it is possible to groupby the hour I'm interested in instead of writing multiple loops then please let me know as I'm new to Pandas.
Boolean indexing in a dataframe will generally require you to use the .loc
indexer. But what is happening here is that there is only a single index as you are looping. Freely translated: df3_clear = df3[True or False]
. I am afraid you do not have a row in your index called True
. Neither False
.
What you are looking for seems to be along the lines of (can probably be a oneliner, but I'm a bit lazy):
df3_clear = df3['1997-11-30 23:00':'1998-03-01'].query('NetLW < -30')
df3_clear = df3_clear.iloc[np.where((df3_clear.index.hour == 23) | (df3_clear.index.hour == 11))]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.