简体   繁体   English

我可以标准化我的 PCA 应用计数向量吗?

[英]Can I standardize my PCA applied count vector?

I have applied CountVectorizer() on my X_train and it returned a sparse matrix.我在我的X_train上应用了CountVectorizer() ,它返回了一个稀疏矩阵。

Usually if we want to Standardize sparse matrix we pass in with_mean=False param.通常,如果我们想标准化稀疏矩阵,我们会传入with_mean=False参数。

scaler = StandardScaler(with_mean=False)
X_train = scaler.fit_transform()

But In my case after applying CountVectorizer on my X_train I have also performed PCA(TruncatedSVD) to reduce dimensions.但对我来说我的应用CountVectorizer后X_train我也进行PCA(TruncatedSVD),以减少尺寸。 Now my data is not a sparse matrix.现在我的数据不是稀疏矩阵。

So now can I apply StandardScaler() directly without passing with_mean=False (ie with_mean=True) ?那么现在我可以直接应用StandardScaler()而不传递with_mean=False (ie with_mean=True)吗?

If you take a look at what the with_mean parameter does, you'll find that it simply centers your data before scaling.如果您查看with_mean参数的作用,您会发现它只是在缩放之前将数据居中。 The reason why you don't center a sparse matrix is because when you try to center a sparse matrix it will get transformed into a dense matrix and will occupy much more memory, thus destroying its sparsity in the first place.您不将稀疏矩阵居中的原因是,当您尝试将稀疏矩阵居中时,它将被转换为稠密矩阵并占用更多内存,从而首先破坏了其稀疏性。

After you perform PCA your data has reduced dimensions and can now be centered before scaling.执行 PCA 后,您的数据已减少维度,现在可以在缩放之前居中。 So yes, you can apply StandardScaler() directly.所以是的,您可以直接应用StandardScaler()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM