简体   繁体   中英

Future-proofing feature scaling in machine learning?

I have a question about how feature scaling works after training a model.

Let's say a neural network model predicts the height of a tree by training on outside temperature.

The lowest outside temperature in my training data is 60F and the max is 100F. I scale the temperature between 0 and 1 and train the model. I save the model for future predictions. Two months later, I want to predict on some new data. But this time the min and max temperatures in my test data are -20F and 50F, respectively.

How does the trained model deal with this? The range I imposed the scaling on in the training set to generate my trained model does not match the test data range.

What would prevent me from hard-coding a range to scale to that I know the data will always be within, say from -50F to 130F? The problem I see here is if I have a model with many features. If I impose a different hard scale to each feature, using feature scaling is essentially pointless, is it not?

Different scales won't work. Your model trains for one scale, it learns one scale, if you change the scale, your model will still think it's the same scale and make very shifted predictions.

Training again will overwrite what was learned before.

So, yes, hardcode your scaling (preferentially directly on your data, not inside the model).

And for a quality result, train with all the data you can gather.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM