简体   繁体   English

在使用 keras 模型之前按列标准化数据

[英]standardizing data column-wise before using keras models

I'm working with a large dataset whose data I want to standardize to use with a CNN.我正在处理一个大型数据集,我想对其数据进行标准化以与 CNN 一起使用。

Does keras have a quick utility to standardize a block of numbers column-wise that you can use inside a Sequential model? keras 是否具有快速实用程序来按列标准化数字块,您可以在顺序 model 中使用它? I'm asking this as i expect eventually the data to be used on-line so ideally this standardization feature could be used on incoming data, ie a trailing moving average of mean and std to normalize the incoming data.我问这个是因为我希望最终数据可以在线使用,所以理想情况下,这个标准化功能可以用于传入数据,即均值和标准的尾随移动平均值来规范化传入数据。

import numpy as np
import pandas as pd

np.random.seed(42)

col_names = ['Column' + str(x+1) for x in range(5)]
training_data = pd.DataFrame(np.random.randint(1,10 **6, 50).reshape(-1,5), columns = col_names)

I am not sure about online, but using sklearn 's StandardScaler() should do the right thing, as described here , seems like the right thing.我不确定在线,但使用sklearnStandardScaler()应该做正确的事情,如此所述,似乎是正确的事情。

We can do from sklearn我们可以从sklearn

from sklearn.preprocessing import StandardScaler
training_data[:]= StandardScaler().fit_transform(training_data.T).T

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM