简体   繁体   English

矩阵归一化

[英]Normalization of a matrix

I have a 150x4 matrix X which I created from a pandas dataframe using the following code: 我有一个150x4的矩阵X ,它是使用以下代码从熊猫数据帧创建的:

X = df_new.as_matrix()

I have to normalize it using this function: 我必须使用以下函数对其进行规范化:

这个

I know that Uj is the mean val of j , and that σ j is the standard deviation of j , but I don't understand what j is. 我知道, Uj是平均VAL j ,而σ j是标准偏差j ,但我不明白什么j是。 I'm having a little trouble understanding what the bar on X is, and I'm confused by the commas in the equation (I don't know if they have any significance or not). 我在理解X上的小节时遇到了一些麻烦,并且对等式中的逗号感到困惑(我不知道它们是否有意义)。

Can anyone help me understand what this equation means so I can then write the normalization using sklearn? 谁能帮助我理解该方程式的含义,以便随后可以使用sklearn编写规范化?

The indexes for matrix X are row (i) and column (j). 矩阵X的索引是行(i)和列(j)。 Hence, X,j means column j of matrix X . 因此, X,j表示矩阵Xj Ie normalize each column of matrix X to z-scores . 即将矩阵X的每一列标准化为z分数

You can do that using pandas: 您可以使用熊猫来做到这一点:

df_new_zscores = (df_new - df_new.mean()) / df_new.std()

You don't actually need to write code for the normalization yourself - it comes ready with sklearn.preprocessing.scale . 您实际上不需要自己编写用于规范化的代码sklearn.preprocessing.scale即可准备就绪。

Here is an example from the docs : 这是docs中的示例:

>>> from sklearn import preprocessing
>>> import numpy as np
>>> X_train = np.array([[ 1., -1.,  2.],
...                     [ 2.,  0.,  0.],
...                     [ 0.,  1., -1.]])
>>> X_scaled = preprocessing.scale(X_train)

>>> X_scaled                                          
array([[ 0.  ..., -1.22...,  1.33...],
       [ 1.22...,  0.  ..., -0.26...],
       [-1.22...,  1.22..., -1.06...]])

When used with the default setting axis=0 , the mormalization happens column-wise (ie for each column j , as in your equestion). 与默认设置axis=0 ,格式化会逐列进行(即,对于j每个列,如您的要求一样)。 As a result, it is easy to confirm that scaled data has zero mean and unit variance: 结果,很容易确认缩放数据的均值和单位方差为零:

>>> X_scaled.mean(axis=0)
array([ 0.,  0.,  0.])

>>> X_scaled.std(axis=0)
array([ 1.,  1.,  1.])

I do not know pandas but I think that the equation means that the normalized matrix is given by 我不了解熊猫,但我认为该方程式表示归一化矩阵为 在此处输入图片说明 You subtract the empirical mean and devide by the empirical standard deviation per column. 您减去经验均值,然后除以每列的经验标准偏差。

You sometimes use this for Principal Component Analysis. 有时您将其用于主成分分析。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM