I need to perform a normalization of a dataframe, contatining one index column and other columns with numeric values.
Index a b c
xy1 555 436 3667
xz2 4626 658 463
xr3 425 674 436
bx4 4636 6567 6346
I want to perform max-min normalization on the dataframe, drop columns containing NaNs, and return the normalized dataframe with the original index. I'm thinking of something like this, but how can I exclude the index column from the loop, so that it stays the same in the returned dataframe?
def normalize(df):
result = df.copy()
for feature_name in df.columns:
max_value = df[feature_name].max()
min_value = df[feature_name].min()
result[feature_name] = (df[feature_name] - min_value) / (max_value - min_value)
if result[feature_name].isnull().values.any():
result.drop([feature_name], axis=1, inplace=True)
print(f'Something wrong in {feature_name}, dropping this feature.')
return result
You can simplify your implementation of min-max
scaling:
s = df.set_index('Index').dropna(axis=1)
s = (s - s.min()) / (s.max() - s.min())
Or, you can use MinMaxScaler
from sklearn.preprocessing
:
from sklearn.preprocessing import MinMaxScaler
s = df.set_index('Index').dropna(axis=1)
s[:] = MinMaxScaler().fit_transform(s)
print(s)
a b c
Index
xy1 0.030872 0.000000 0.546701
xz2 0.997625 0.036209 0.004569
xr3 0.000000 0.038819 0.000000
bx4 1.000000 1.000000 1.000000
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.