简体   繁体   English

sklearn StandardScaler 返回全零

[英]sklearn StandardScaler returns all zeros

I have a sklearn StandardScaler saved from a previous model and am trying to apply it to new data我有一个从以前的模型中保存的 sklearn StandardScaler并且正在尝试将其应用于新数据

scaler = myOldStandardScaler
print("ORIG:", X)
print("CLASS:", X.__class__)
X = scaler.fit_transform(X)
print("SCALED:", X)

I have three observations each with 2000 features.我有三个观察结果,每个观察结果有 2000 个特征。 If I run each observation separately I get an output of all zeros.如果我分别运行每个观察,我会得到一个全零的输出。

ORIG: [[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: <class 'numpy.matrixlib.defmatrix.matrix'>
SCALED: [[ 0.  0.  0. ...,  0.  0.  0.]]

But if I append all three observations into one array, I get the results I want但是如果我将所有三个观察结果附加到一个数组中,我就会得到我想要的结果

ORIG: [[  0.00000000e+00   8.69737728e-08   7.53361877e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  9.49627142e-04   0.00000000e+00   0.00000000e+00 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]
[  3.19029839e-04   0.00000000e+00   1.90985485e-06 ...,   0.00000000e+00
0.00000000e+00   0.00000000e+00]]
CLASS: <class 'numpy.matrixlib.defmatrix.matrix'>
SCALED: [[-1.07174217  1.41421356  1.37153077 ...,  0.          0.          0.        ]
[ 1.33494964 -0.70710678 -0.98439142 ...,  0.          0.          0.        ]
[-0.26320747 -0.70710678 -0.38713935 ...,  0.          0.          0.        ]]

I've seen these two questions:我已经看到了这两个问题:

neither of which have an accepted answer.两者都没有公认的答案。

I've tried:我试过:

  • reshaping from (1,n) to (n,1) (this gives incorrect results)从 (1,n) 到 (n,1) 的整形(这会给出错误的结果)
  • converting the array to np.float32 and np.float64 (still all zero)将数组转换为np.float32np.float64 (仍然全为零)
  • creating an array of an array (again, all zero)创建一个数组的数组(再次,全部为零)
  • creating a np.matrix (again, all zeros)创建一个np.matrix (再次,全零)

What am I missing?我错过了什么? The input to fit_transform is getting the same type, just a different size. fit_transform的输入是相同的类型,只是大小不同。

How do I get StandardScaler to work with a single observation?如何让 StandardScaler 与单个观察一起工作?

When you're trying to apply fit_transform method of StandardScaler object to array of size (1, n) you obviously get all zeros, because for each number of array you subtract from it mean of this number, which equal to number and divide to std of this number.当您尝试将StandardScaler对象的fit_transform方法应用于大小为 (1, n) 的数组时,您显然会得到全零,因为对于每个数组数,您要从中减去该数的平均值,该数等于 number 并除以 std这个数字。 If you want to get correct scaling of your array, you should convert it to array with size (n, 1).如果要正确缩放数组,则应将其转换为大小为 (n, 1) 的数组。 You can do it this way:你可以这样做:

import numpy as np

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.fit_transform(X[:, np.newaxis])

In this case you get Standard scaling for one object by its features, that's not you're looking for.在这种情况下,您可以通过一个对象的功能获得一个对象的标准缩放,这不是您要找的。
If you want to get scaling by one feature of 3 objects, you should pass to fit_transform method array of size (3, 1) with values of certain feature corresponding to each object.如果fit_transform 3 个对象的一个​​特征进行缩放,则应将大小为 (3, 1) 的数组传递给fit_transform方法,其中包含与每个对象对应的特定特征的值。

X = np.array([0.00000000e+00, 9.49627142e-04, 3.19029839e-04])
X_transformed = scaler.fit_transform(X[:, np.newaxis]) # you should get
# array([[-1.07174217], [1.33494964], [-0.26320747]]) you're looking for

And if you want to work with already fitted StandardScaler object, you shouldn't use fit_transform method, beacuse it refit object with new data.如果你想使用已经拟合的 StandardScaler 对象,你不应该使用fit_transform方法,因为它用新数据重新拟合对象。 StandardScaler has transform method, which work with single observation: StandardScalertransform方法,它适用于单一观察:

X = np.array([1, -4, 5, 6, -8, 5]) # here should be your X in np.array format
X_transformed = scaler.transform(X.reshape(1, -1))

I had the same problem.我有同样的问题。 Another (simpler) solution to the problem of array with size (1, n) is to transpose the matrix and it will be size (n, 1).大小为 (1, n) 的数组问题的另一个(更简单)解决方案是转置矩阵,它的大小为 (n, 1)。

X = np.array([0.00000000e+00, 9.49627142e-04, 3.19029839e-04])
X_transformed = scaler.transform(X.T)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM