你如何为StandardScaler编写Python函数？

Question

I am trying to write some more concise code for a project. 我正在尝试为项目编写一些更简洁的代码。 I have created new variables to rescale certain columns of a pandas dataframe. 我创建了新的变量来重新缩放pandas数据帧的某些列。 I would like to create a function that would this more efficiently. 我想创建一个更有效的功能。 Does anyone have any ideas or resources on how I can accomplish this? 有没有人对我如何实现这一点有任何想法或资源？

scaler = StandardScaler()


whole_scaled = scaler.fit_transform(df_milk_types['Whole'].values.reshape(-1, 1))
red_fat_scaled = scaler.fit_transform(df_milk_types['Two Percent Fat'].values.reshape(-1, 1))
low_fat_scaled = scaler.fit_transform(df_milk_types['One Percent Fat'].values.reshape(-1, 1))
skim_scaled = scaler.fit_transform(df_milk_types['Skim'].values.reshape(-1, 1))

Answer 1

Well the method that you are using is efficient enough already. 那么你正在使用的方法已经足够有效了。 But if you want to implement it yourself you can do something like: 但是如果你想自己实现它，你可以做类似的事情：

def Standardize(x):
     x = np.asarray(x)
     return (x - np.mean(x)) / (np.std(x))

But keep in mind that written like this you will not be able to apply a reverse transform as you will loose that mean and the std of the original data. 但请记住，这样写的你将无法应用反向变换，因为你将失去原始数据的平均值和标准。

Anyway ... applying the method is now trivial: 无论如何...应用该方法现在是微不足道的：

df.loc[:, 'column_name'] = Standardize(df.loc[:, 'column_name'])

Another thing to keep in mind is that when the number of rows in the dataframe is a big number np.std will return memory error ... 另外要记住的是，当数据帧中的行数是一个大数字时，np.std将返回内存错误...

你如何为StandardScaler编写Python函数？

问题描述

1 个解决方案

解决方案1
0 2019-06-11 10:26:44

你如何为StandardScaler编写Python函数？

问题描述

1 个解决方案

解决方案1 0 2019-06-11 10:26:44

解决方案1
0 2019-06-11 10:26:44