简体   繁体   English

你如何为StandardScaler编写Python函数?

[英]How do you write a Python function for StandardScaler?

I am trying to write some more concise code for a project. 我正在尝试为项目编写一些更简洁的代码。 I have created new variables to rescale certain columns of a pandas dataframe. 我创建了新的变量来重新缩放pandas数据帧的某些列。 I would like to create a function that would this more efficiently. 我想创建一个更有效的功能。 Does anyone have any ideas or resources on how I can accomplish this? 有没有人对我如何实现这一点有任何想法或资源?

scaler = StandardScaler()


whole_scaled = scaler.fit_transform(df_milk_types['Whole'].values.reshape(-1, 1))
red_fat_scaled = scaler.fit_transform(df_milk_types['Two Percent Fat'].values.reshape(-1, 1))
low_fat_scaled = scaler.fit_transform(df_milk_types['One Percent Fat'].values.reshape(-1, 1))
skim_scaled = scaler.fit_transform(df_milk_types['Skim'].values.reshape(-1, 1))

Well the method that you are using is efficient enough already. 那么你正在使用的方法已经足够有效了。 But if you want to implement it yourself you can do something like: 但是如果你想自己实现它,你可以做类似的事情:

def Standardize(x):
     x = np.asarray(x)
     return (x - np.mean(x)) / (np.std(x))

But keep in mind that written like this you will not be able to apply a reverse transform as you will loose that mean and the std of the original data. 但请记住,这样写的你将无法应用反向变换,因为你将失去原始数据的平均值和标准。

Anyway ... applying the method is now trivial: 无论如何...应用该方法现在是微不足道的:

df.loc[:, 'column_name'] = Standardize(df.loc[:, 'column_name'])

Another thing to keep in mind is that when the number of rows in the dataframe is a big number np.std will return memory error ... 另外要记住的是,当数据帧中的行数是一个大数字时,np.std将返回内存错误...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM