scikit-learn MinMaxScaler产生的结果与NumPy实现略有不同

Question

I compared the scikit-learn Min-Max scaler from its preprocessing module with a "manual" approach using NumPy. 我将其preprocessing模块中的scikit-learn Min-Max缩放器与使用NumPy的“手动”方法进行了比较。 However, I noticed that the result is slightly different. 但是，我注意到结果略有不同。 Does anyone have a explanation for this? 有没有人对此有解释？

Using the following equation for Min-Max scaling: 使用以下等式进行最小 - 最大缩放：

在此输入图像描述

which should be the same as the scikit-learn one: (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0)) 它应与scikit-learn one相同： (X - X.min(axis=0)) / (X.max(axis=0) - X.min(axis=0))

I am using both approaches as follows: 我使用两种方法如下：

def numpy_minmax(X):
    xmin =  X.min()
    return (X - xmin) / (X.max() - xmin)

def sci_minmax(X):
    minmax_scale = preprocessing.MinMaxScaler(feature_range=(0, 1), copy=True)
    return minmax_scale.fit_transform(X)

On a random sample: 在随机样本上：

import numpy as np

np.random.seed(123)

# A random 2D-array ranging from 0-100

X = np.random.rand(100,2)
X.dtype = np.float64
X *= 100

The results are slightly different: 结果略有不同：

from matplotlib import pyplot as plt

sci_mm = sci_minmax(X)
numpy_mm = numpy_minmax(X)

plt.scatter(numpy_mm[:,0], numpy_mm[:,1],
        color='g',
        label='NumPy bottom-up',
        alpha=0.5,
        marker='o'
        )

plt.scatter(sci_mm[:,0], sci_mm[:,1],
        color='b',
        label='scikit-learn',
        alpha=0.5,
        marker='x'
        )

plt.legend()
plt.grid()

plt.show()

在此输入图像描述

Answer 1

scikit-learn processes each feature individually. scikit-learn处理每个功能。 So, you need to specify axis=0 when taking min , otherwise numpy.min would be the min on all the elements of the array, not each column separately: 所以，你需要在取min时指定axis=0 ，否则numpy.min将是数组所有元素的min，而不是每个列分别：

>>> xs
array([[1, 2],
       [3, 4]])
>>> xs.min()
1
>>> xs.min(axis=0)
array([1, 2])

same thing for numpy.max ; numpy.max ; so the correct function would be: 所以正确的功能是：

def numpy_minmax(X):
    xmin =  X.min(axis=0)
    return (X - xmin) / (X.max(axis=0) - xmin)

Doing so you will get an exact match: 这样做你将获得完全匹配：

完全符合

scikit-learn MinMaxScaler产生的结果与NumPy实现略有不同

问题描述

1 个解决方案

解决方案1
11 已采纳 2014-07-13 16:41:33

scikit-learn MinMaxScaler产生的结果与NumPy实现略有不同

问题描述

1 个解决方案

解决方案1 11 已采纳 2014-07-13 16:41:33

解决方案1
11 已采纳 2014-07-13 16:41:33