I want to count the time of every phase in my series. For phase I mean the number of repetition of consecutive 1 or 0 for example:
rng = pd.date_range('2015-02-24', periods=15, freq='T')
s = pd.Series([0,1,1,1,0,0,1,0,1,0,1,1,1,1,0],index=rng)
I would like as output:
phase0 -> zeros:1 minute, ones:3 minutes,
pahse1 -> zeros:6 minutes, ones:4 minutes,
etc
In this case valuabe is >= than 3.
I was able to remove the 1 with low repetition with this:
index_to_remove=s.groupby((s.shift() != s).cumsum()).filter(lambda x: len(x) < 3).index
And now I can put equal 0 in the original time series the elemnts at that index.
s[index_to_remove]=0
What miss is to count the minutes of every phase.
Someone can help me? I'am interested in a smart way of doing it. I am not so proud of what I ve used until now so if you can give me a better way I will appreciate.
Thank you all
*** I know I should work with s.diff()
and when this new time series goes from 1 to -1 is a phase of ones while whem it goes from -1 to 1 is a phase of zeros
I think you need aggreggate min
and max
, get difference, convert to minutes with add 1 minute and reshape to DataFrame:
#faster solution for set 0 by length per groups
m=s.groupby((s.shift() != s).cumsum()).transform('size') < 3
s[m]=0
#create groups for 0,1 pairs
res = (s.eq(0) & s.shift().eq(1)).cumsum()
print (res)
df = s.index.to_series().groupby([res, s]).agg(['min','max'])
df = (df['max'].sub(df['min'])
.dt.total_seconds()
.div(60)
.add(1)
.unstack(fill_value=0)
.astype(int)
.rename_axis('phase'))
print (df)
0 1
phase
0 1 3
1 6 4
2 1 0
*** This is best solution i found:
from itertools import groupby
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]
but I can't hande the fact of grouping 0 and 1 together
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.