This is my dataset (pandas DataFrame df
):
DateTime INDICATOR
2017-01-01 10:35:00 0
2017-01-01 10:40:00 0
2017-01-01 10:45:00 0
2017-01-01 10:50:00 0
2017-01-01 10:55:00 0
2017-01-01 11:00:00 0
2017-01-01 11:05:00 1
2017-01-01 11:10:00 1
2017-01-01 11:15:00 1
2017-01-01 11:20:00 1
2017-01-01 11:25:00 0
2017-01-01 11:30:00 0
2017-01-01 11:35:00 1
2017-01-01 11:40:00 1
2017-01-01 11:45:00 1
The column DateTime
is of the type datetime64[ns]
.
I want to obtain the duration (in minutes) of the data segments where INDICATOR
is equal to 1.
The expected result is:
[15, 10]
This is how I tried to solve this task but I receive all 0 values:
s=df["INDICATOR"].eq(1)
df1=df[s].copy()
s1=df1.groupby(s.cumsum())["DateTime"].transform(lambda x : x.max()-x.min()).dt.seconds
All values of s1
are 0.
First, create groupID by using:
gb_ID = df.INDICATOR.diff().ne(0).cumsum()
Next, pick only INDICATOR == 1
and doing groupby
by gb_ID
. Find max
, min
of DateTime
per gb_ID. Find diff
of this max
, min
. Finally, pick columns not NaT
to convert it to int of minutes and call values
to return array.
df.query('INDICATOR == 1').groupby(gb_ID)['DateTime'].agg(['min', 'max']) \
.diff(axis=1)['max'].dt.seconds.floordiv(60).values
Out[351]: array([15, 10], dtype=int64)
Below is the dataframe before picking non- NaT
and values
df.query('INDICATOR == 1').groupby(gb_ID)['DateTime'].agg(['min', 'max']).diff(axis=1)
Out[362]:
min max
INDICATOR
2 NaT 00:15:00
4 NaT 00:10:00
Taking this post into account I was thinking to split the dataframe into subframes with np.split()
.
Try this:
from numpy import nan
# split df on condition that indicator is 0
splitted_dfs = np.split(df, *np.where(df. INDICATOR == 0))
results = []
for split in splitted_dfs:
# iloc[1:] omits the first 0 entry of the splitted df
results.append(split.iloc[1:].index.max() - split.iloc[1:].index.min())
print([int(x.seconds / 60) for x in results if x.seconds is not nan])
# prints to [15, 10]
Explanation
np.split()
with condition INDICATOR == 0
makes a split on every row where the condition is met. This yields this list of dataframes:
2017-01-01 10:35:00 0, INDICATOR
2017-01-01 10:40:00 0, INDICATOR
2017-01-01 10:45:00 0, INDICATOR
2017-01-01 10:50:00 0, INDICATOR
2017-01-01 10:55:00 0, INDICATOR
2017-01-01 11:00:00 0
2017-01-01 11:05:00 1
2017-01-01 11:10:00 1
2017-01-01 11:15:00 1
2017-01-01 11:20:00 1, INDICATOR
2017-01-01 11:25:00 0, INDICATOR
2017-01-01 11:30:00 0
2017-01-01 11:35:00 1
2017-01-01 11:40:00 1
2017-01-01 11:45:00 1
You can iterate over that list, ignore the empty ones and remove the first 0 entry of the relevant ones.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.