简体   繁体   中英

What happens when you transform the test set using MinMaxScaler

i am currently in the process of pre-processing my data and I understand that i have to use the same scaling parameters I have used on my training set, on my test set. However, when i applied the transform method from sklearn library, i noticed something weird.

I first used preprocessing.MinMaxScaler(feature_range=(0,1)) on my training set which sets the maximum to be 1 and minimum to be 0. Next, i used minmax_scaler.transform(data) on my test set and I've noticed when i printed out the data-frame, I have values that are greater than 1. What can this possibly mean?

For a given feature x , your minmax scaling to (0,1) will effectively map:

x to (x- min_train_x)/(max_train_x - min_train_x)

where min_train_x and max_train_x are the minimum and maximum value of x in the training set .

If a value of x in the testing set is larger than the max_train_x the scaling transformation will return a value > 1 .

It usually is not a big problem except if the input has to be in the (0,1) range.

Actually MinMaxScalar is used when you want you data to be in specific range. for example if you have data like

this is 2d Array

[
[1000,2000],
[3000,4000],
[1,2],
[3,50]
]

now in this data I want that minimum number will be 1 and maximum number will be 100 so I have to convert all the data into range (1,100)

Now My data will become

[
[ 33.97799266,50.47523762],
[100,100],
[1,1],
[1.06602201,2.1885943 ]
]

MinMax Scaler In python

from sklearn.preprocessing import MinMaxScaler
data = [[1000,2000],[3000,4000],[1,2],[3,50]]
scaler = MinMaxScaler(feature_range=(1, 100))
print(scaler.fit(data))
print(scaler.transform(data))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM