简体   繁体   中英

Scaling data with large range in Machine learning preprocessing

I am very much new to Machine Learning. And I am trying to apply ML on data containing nearly 50 features. Some features have range from 0 to 1000000 and some have range from 0 to 100 or even less than that. Now when I use feature scaling by using MinMaxScaler for range (0,1) I think features having large range scales down to very small values and this might affect me to give good predictions.

I would like to know if there is some efficient way to do scaling so that all the features are scaled appropriately.

I also tried standared scaler but accuracy did not improve. Also Can I use different scaling function for some features and another for remaining features.

Thanks in advance!

Feature scaling, or data normalization, is an important part of training a machine learning model. It is generally recommended that the same scaling approach is used for all features. If the scales for different features are wildly different, this can have a knock-on effect on your ability to learn (depending on what methods you're using to do it). By ensuring standardized feature values, all features are implicitly weighted equally in their representation.

Two common methods of normalization are:

  • Rescaling (also known as min-max normalization):

    在此输入图像描述

    where x is an original value, and x' is the normalized value. For example, suppose that we have the students' weight data, and the students' weights span [160 pounds, 200 pounds]. To rescale this data, we first subtract 160 from each student's weight and divide the result by 40 (the difference between the maximum and minimum weights).

  • Mean normalization

    在此输入图像描述

    where x is an original value, and x' is the normalized value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM