I am working on a data frame with DateTimeIndex of hourly temperature data spanning a couple of years. I want to add a column with the minimum temperature between 20:00 of a day and 8:00 of the following day. Daytime temperatures - from 8:00 to 20:00 - are not of interest. The result can either be at the same hourly resolution of the original data or be resampled to days.
I have researched a number of strategies to solve this, but am unsure about the most efficienct (in terms of primarily coding efficiency and secondary computing efficiency) respectively pythonic way to do this. Some of the possibilities I have come up with:
df.index.hour
and use group_by
or df.loc
to find the minimum df.between_time
( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.between_time.html#pandas.DataFrame.between_time ) though I'm not sure if the date change over midnight will make this a bit messy. Original df looks like this:
datetime temp
2009-07-01 01:00:00 17.16
2009-07-01 02:00:00 16.64
2009-07-01 03:00:00 16.21 #<-- minimum for the night 2009-06-30 (previous date since periods starts 2009-06-30 20:00)
... ...
2019-06-24 22:00:00 14.03 #<-- minimum for the night 2019-06-24
2019-06-24 23:00:00 18.87
2019-06-25 00:00:00 17.85
2019-06-25 01:00:00 17.25
I want to get something like this (min temp from day 20:00 to day+1 8:00):
datetime temp
2009-06-30 23:00:00 16.21
2009-07-01 00:00:00 16.21
2009-07-01 01:00:00 16.21
2009-07-01 02:00:00 16.21
2009-07-01 03:00:00 16.21
... ...
2019-06-24 22:00:00 14.03
2019-06-24 23:00:00 14.03
2019-06-25 00:00:00 14.03
2019-06-25 01:00:00 14.03
or a bit more succinct:
datetime temp
2009-06-30 16.21
... ...
2019-06-24 14.03
Use the base
option to resample
:
rs = df.resample('12h', base=8).min()
Then keep only the rows for 20:00:
rs[rs.index.hour == 20]
you can use TimeGrouper
with freq=12h
and base=8
to chunk the dataframe every 12h from 20:00 - (+day)08:00,
then you can just use .min()
try this:
import pandas as pd
from io import StringIO
s = """
datetime temp
2009-07-01 01:00:00 17.16
2009-07-01 02:00:00 16.64
2009-07-01 03:00:00 16.21
2019-06-24 22:00:00 14.03
2019-06-24 23:00:00 18.87
2019-06-25 00:00:00 17.85
2019-06-25 01:00:00 17.25"""
df = pd.read_csv(StringIO(s), sep="\s\s+")
df['datetime'] = pd.to_datetime(df['datetime'])
result = df.sort_values('datetime').groupby(pd.Grouper(freq='12h', base=8, key='datetime')).min()['temp'].dropna()
print(result)
Output:
datetime
2009-06-30 20:00:00 16.21
2019-06-24 20:00:00 14.03
Name: temp, dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.