(Not duplicate / I did my research)
My minute-based
dataframe
looks like this:
time, price_bool, price_date
2017-01-01 00:00:00, False,
2017-01-01 00:01:00, False,
2017-01-01 00:02:00, True, 2017-01-01 00:02:00
2017-01-01 00:03:00, False,
2017-01-01 00:04:00, False,
2017-01-01 00:05:00, True, 2017-01-01 00:05:00
....
Right now it is a minute-based
dataset. I want to group
by day
by the first
occurrence of True
and skip to another day
once the first True
is found. If there are no True
in a given minute-based
dataset, then that day
will have 0
on the price_date
.
My new dataframe
should look like this:
time, price_bool, price_date
2017-01-01 00:00:00, True, 2017-01-01 00:02:00
2017-01-02 00:00:00, True, 2017-01-02 00:07:00
2017-01-03 00:00:00, True, 2017-01-03 02:21:00
2017-01-04 00:00:00, True, 2017-01-04 01:17:00
....
This is the day
based dataset where price_bool
is True
and corrsponding price_date
when it was first True
for a given day
What did I do?
empty
fieldgroupby('time')
However, it does not work.
Simpler starting data:
df = pd.DataFrame([
["2017-01-01 00:00:00",False,pd.np.nan],
["2017-01-01 00:00:01",True,"2017-01-01 00:00:01"],
["2017-01-01 00:00:02",True,"2017-01-01 00:00:01"],
["2017-01-02 00:00:00",False,pd.np.nan],
], columns=['time','price_bool','price_date'])
df['time'] = df['time'].apply(pd.to_datetime)
This should get you the data you show in your result (note this assumes you're already sorted in chronological order):
res = df[df['price_bool'] == True].groupby(df['time'].dt.date)[['price_bool','price_date']].first().reset_index()
However, I think you're saying that you want to keep dates with price_bool
false and have the price_date
be 0
in that case. So you would need to add back the dates that are missing in res
above. Here's one option:
# Get the True data set right.
res = df[df['price_bool'] == True].groupby(df['time'].dt.date)[['price_bool','price_date']].first()
# Add back the missing dates with only False values
res = res.reindex(df['time'].dt.date.unique()).reset_index()
# Fill in the null values.
res = res.fillna({'price_bool':False, 'price_date':0})
Out (note I created a simpler starting data set):
time price_bool price_date
0 2017-01-01 True 2017-01-01 00:00:01
1 2017-01-02 False 0
df.sort_values('time').sort_values('price_bool', ascending = False).groupby(df['time'].dt.date).first()
Output with your provided df:
>>> df
time price_bool
2017-01-01 True
Explanation : You want to sort by two columns: time
and price_bool
. The latter needs to be sorted in reverse as you want True
to appear before False
. Then, since groupby preserves sorting, you can simply select the first element from each group after grouping by date.
IIUC:
first_true_daily = df.groupby(pd.Grouper(key='time', freq='D'))['price_bool'].idxmax()
df.loc[first_true_daily]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.