简体   繁体   中英

use time series data in python to calculate mean, variance std deviation

I have data collected from sensors that looks like:

sec   nanosec value 

1001   1       0.2 

1001   2       0.2

1001   3       0.2 

1002   1       0.1  

1002   2       0.2   

1002   3       0.1 

1003   1       0.2 

1003   2       0.2

1003   3       0.1  

1004   1       0.2   

1004   2       0.2 

1004   3       0.2 

1004   4      0.1 

I want to calculate average,std deviation and some other stats like maximum, minimum for a column every 2 seconds. so average for (1001, 1002)= 0.167, average of (1003,1004)=0.17

From the tutorials http://earthpy.org/pandas-basics.html , I think I should convert it to time series and the use rolling _means from pandas, but I am new to time series data so I am not sure if that is the correct way. Also how do I specify frequency here for conversion as observations for the first second have less observations. So for actual data I have less than 100 readings for 1001 second and then 100 observations for 1002 second onwards.

I could also do a simple groupby on seconds but it would group readings per second and not every 2 seconds, then how could i combine observations for 2 consecutive groups from groupby and then do analysis.

I think you can first convert column sec to_timedelta , set_index and resample by 2 seconds ( 2S ):

df['sec'] = pd.to_timedelta(df.sec, unit='s')
df.set_index('sec', inplace=True)
print (df)
          nanosec  value
sec                     
00:16:41        1    0.2
00:16:41        2    0.2
00:16:41        3    0.2
00:16:42        1    0.1
00:16:42        2    0.2
00:16:42        3    0.1
00:16:43        1    0.2
00:16:43        2    0.2
00:16:43        3    0.1
00:16:44        1    0.2
00:16:44        2    0.2
00:16:44        3    0.2
00:16:44        4    0.1
print (df.value.resample('2S').mean())
sec
00:16:41    0.166667
00:16:43    0.171429
00:16:45         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S').std())
sec
00:16:41    0.051640
00:16:43    0.048795
00:16:45         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S').max())
sec
00:16:41    0.2
00:16:43    0.2
00:16:45    NaN
Freq: 2S, Name: value, dtype: float64

Maybe you need change base in resample :

print (df.value.resample('2S', base=1).mean())
sec
00:16:42    0.166667
00:16:44    0.171429
00:16:46         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=1).std())
sec
00:16:42    0.051640
00:16:44    0.048795
00:16:46         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=1).max())
sec
00:16:42    0.2
00:16:44    0.2
00:16:46    NaN
Freq: 2S, Name: value, dtype: float64
print (df.value.resample('2S', base=2).mean())
sec
00:16:43    0.166667
00:16:45    0.171429
00:16:47         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=2).std())
sec
00:16:43    0.051640
00:16:45    0.048795
00:16:47         NaN
Freq: 2S, Name: value, dtype: float64

print (df.value.resample('2S', base=2).max())
sec
00:16:43    0.2
00:16:45    0.2
00:16:47    NaN
Freq: 2S, Name: value, dtype: float64

Borrowing jezrael's code to setup things up:

df['sec'] = pd.to_timedelta(df.sec, unit='s')
df.set_index('sec', inplace=True)
print (df)
          nanosec  value
sec                     
00:16:41        1    0.2
00:16:41        2    0.2
00:16:41        3    0.2
00:16:42        1    0.1
00:16:42        2    0.2
00:16:42        3    0.1
00:16:43        1    0.2
00:16:43        2    0.2
00:16:43        3    0.1
00:16:44        1    0.2
00:16:44        2    0.2
00:16:44        3    0.2
00:16:44        4    0.1

Use pd.TimeGrouper('2S') and describe()

df.groupby(pd.TimeGrouper('2S')).describe()

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM