简体   繁体   中英

Manual normalization function taking too long to execute

I am trying to implement a normalization function manually rather than using the scikit learn's one. The reason is that, I need to define the maximum and minimum parameters manually and scikit learn doesn't allow that alteration.

I successfully implemented this to normalize the values between 0 and 1. But it is taking a very long time to run.

Question: Is there another efficient way I can do this? How can I make this execute faster.

Shown below is my code:

scaled_train_data = scale(train_data)

def scale(data):
    for index, row in data.iterrows():
        X_std = (data.loc[index, "Close"] - 10) / (2000 - 10)
        data.loc[index, "Close"] = X_std

    return data

2000 and 10 are the attributes that i defined manually rather than taking the minimum and the maximum value of the dataset.

Thank you in advance.

Use numpy's matrix.you can also set your min and max mannually.

import numpy as np
data = np.array(df)
_min = np.min(data, axis=0)
_max = np.max(data, axis=0)
normed_data = (data - _min) / (_max - _min)

Why loop? You can just use

train_data['close'] = (train_data['close'] - 10)/(2000 - 10) 

to make use of vectorized numpy functions. Of course, you could also put this in a function, if you prefer.

Alternatively, if you want to rescale to a linear range, you could use http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html . The advantage of this is that you can save it and then rescale the test data in the same manner.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM