简体   繁体   中英

Count valuable (more than n times) repetitions of a pandas time series

I want to count the time of every phase in my series. For phase I mean the number of repetition of consecutive 1 or 0 for example:

rng = pd.date_range('2015-02-24', periods=15, freq='T')
s = pd.Series([0,1,1,1,0,0,1,0,1,0,1,1,1,1,0],index=rng)

I would like as output:

phase0 -> zeros:1 minute, ones:3 minutes,
pahse1 -> zeros:6 minutes, ones:4 minutes,
etc

In this case valuabe is >= than 3.

I was able to remove the 1 with low repetition with this:

index_to_remove=s.groupby((s.shift() != s).cumsum()).filter(lambda x: len(x) < 3).index

And now I can put equal 0 in the original time series the elemnts at that index.

s[index_to_remove]=0

What miss is to count the minutes of every phase.

Someone can help me? I'am interested in a smart way of doing it. I am not so proud of what I ve used until now so if you can give me a better way I will appreciate.

Thank you all

*** I know I should work with s.diff() and when this new time series goes from 1 to -1 is a phase of ones while whem it goes from -1 to 1 is a phase of zeros

I think you need aggreggate min and max , get difference, convert to minutes with add 1 minute and reshape to DataFrame:

#faster solution for set 0 by length per groups
m=s.groupby((s.shift() != s).cumsum()).transform('size') < 3
s[m]=0

#create groups for 0,1 pairs
res = (s.eq(0) & s.shift().eq(1)).cumsum()
print (res)


df = s.index.to_series().groupby([res, s]).agg(['min','max'])
df = (df['max'].sub(df['min'])
               .dt.total_seconds()
               .div(60)
               .add(1)
               .unstack(fill_value=0)
               .astype(int)
               .rename_axis('phase'))
print (df)
       0  1
phase      
0      1  3
1      6  4
2      1  0

*** This is best solution i found:

from itertools import groupby
groups = groupby(s)
result = [(label, sum(1 for _ in group)) for label, group in groups]

but I can't hande the fact of grouping 0 and 1 together

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM