简体   繁体   中英

Standard deviation of time series data on two columns

I have a data frame with two-columns of data for a day with a time series index. The sample data is in 1-minute and I want to create a 5-minute data frame where a 5-minute interval will be flagged false when the standard deviation of the 5 samples in the respective 5-minute is not deviating by 5% of the mean of the 5-samples and this need to be performed for each of the 5-minutes in the day and for each column. As seen below for DF1 column X we calculate the mean and standard deviation of the 5 samples from 16:01 to 16:05 and we see the %(Std/Mean) and same thing will be done for the next 5 samples and for column y. Then DF2 will be populated if %(std/Mean)>5% then the particular 5 minute interval will be false.

在此处输入图像描述

You can use the resample method of the pandas data frames, for that the dataframe most be index with a time stamp. Here an example:

import pandas as pd
import numpy as np
dates = pd.date_range('1/1/2020', periods=30)
df = pd.DataFrame(np.random.randn(30,2), index=dates, columns=['X','Y'])
df.head()

lbl = 'right' # set the label of the window index to the value of the right
w = '3d'
threshold = 1 # here goes your threshold for flagging the ration of standard deviation and mean
x=df.resample(w, label=lbl).std()['X'] / df.resample(w, label=lbl).mean()['X'] > threshold
y=df.resample(w, label=lbl).std()['Y'] / df.resample(w, label=lbl).mean()['Y'] > threshold

DF2 = pd.concat([x,y], axis=1) 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM