我可以标准化我的 PCA 应用计数向量吗？

Question

I have applied CountVectorizer() on my X_train and it returned a sparse matrix.我在我的X_train上应用了CountVectorizer() ，它返回了一个稀疏矩阵。

Usually if we want to Standardize sparse matrix we pass in with_mean=False param.通常，如果我们想标准化稀疏矩阵，我们会传入with_mean=False参数。

scaler = StandardScaler(with_mean=False)
X_train = scaler.fit_transform()

But In my case after applying CountVectorizer on my X_train I have also performed PCA(TruncatedSVD) to reduce dimensions.但对我来说我的应用CountVectorizer后X_train我也进行PCA（TruncatedSVD），以减少尺寸。 Now my data is not a sparse matrix.现在我的数据不是稀疏矩阵。

So now can I apply StandardScaler() directly without passing with_mean=False (ie with_mean=True) ?那么现在我可以直接应用StandardScaler()而不传递with_mean=False (ie with_mean=True)吗？

Answer 1

If you take a look at what the with_mean parameter does, you'll find that it simply centers your data before scaling.如果您查看with_mean参数的作用，您会发现它只是在缩放之前将数据居中。 The reason why you don't center a sparse matrix is because when you try to center a sparse matrix it will get transformed into a dense matrix and will occupy much more memory, thus destroying its sparsity in the first place.您不将稀疏矩阵居中的原因是，当您尝试将稀疏矩阵居中时，它将被转换为稠密矩阵并占用更多内存，从而首先破坏了其稀疏性。

After you perform PCA your data has reduced dimensions and can now be centered before scaling.执行 PCA 后，您的数据已减少维度，现在可以在缩放之前居中。 So yes, you can apply StandardScaler() directly.所以是的，您可以直接应用StandardScaler() 。

我可以标准化我的 PCA 应用计数向量吗？

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-03-08 18:08:00

我可以标准化我的 PCA 应用计数向量吗？

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-03-08 18:08:00

解决方案1
1 已采纳 2019-03-08 18:08:00