简体   繁体   中英

How to do standardization on time series data with Scikit-learn Standard Scaler?

I am using Keras , so the shape of data is (batch_size, timesteps, input_dim). And Standard Scaler just fits 2D data.

One solution I thought was using partial fit and then transform.

scaler = StandardScaler()
for sample in range(data.shape[0]):
    scaler.partial_fit(data[sample])

for sample in range(data.shape[0]):
    data[sample] = scaler.transform(data[sample])

Is this a correct/efficient approach?

You have two possibilities

data = np.random.randn(batch_size*time_length*nb_feats).reshape((bsize,time,feats))

Version 1 is doing what you say:

scaler = StandardScaler()
for sample in range(data.shape[0]):
    scaler.partial_fit(data[sample])

for sample in range(data.shape[0]):
    data[sample] = scaler.transform(data[sample])

Another possibility (Version 2) is to flatten the array, fit and transform and then reshape it

scaler = StandardScaler()
data   = scaler.fit_transform(data.reshape((bsize*time,feats))).reshape((bsize,time,feats))

In my computer

Version 1 takes 0.8759770393371582 seconds

Version 2 takes 0.11733722686767578 seconds

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM