[英]Can I standardize my PCA applied count vector?
I have applied CountVectorizer()
on my X_train
and it returned a sparse matrix.我在我的X_train
上应用了CountVectorizer()
,它返回了一个稀疏矩阵。
Usually if we want to Standardize sparse matrix we pass in with_mean=False
param.通常,如果我们想标准化稀疏矩阵,我们会传入with_mean=False
参数。
scaler = StandardScaler(with_mean=False)
X_train = scaler.fit_transform()
But In my case after applying CountVectorizer on my X_train
I have also performed PCA(TruncatedSVD) to reduce dimensions.但对我来说我的应用CountVectorizer后X_train
我也进行PCA(TruncatedSVD),以减少尺寸。 Now my data is not a sparse matrix.现在我的数据不是稀疏矩阵。
So now can I apply StandardScaler()
directly without passing with_mean=False
(ie with_mean=True)
?那么现在我可以直接应用StandardScaler()
而不传递with_mean=False
(ie with_mean=True)
吗?
If you take a look at what the with_mean
parameter does, you'll find that it simply centers your data before scaling.如果您查看with_mean
参数的作用,您会发现它只是在缩放之前将数据居中。 The reason why you don't center a sparse matrix is because when you try to center a sparse matrix it will get transformed into a dense matrix and will occupy much more memory, thus destroying its sparsity in the first place.您不将稀疏矩阵居中的原因是,当您尝试将稀疏矩阵居中时,它将被转换为稠密矩阵并占用更多内存,从而首先破坏了其稀疏性。
After you perform PCA your data has reduced dimensions and can now be centered before scaling.执行 PCA 后,您的数据已减少维度,现在可以在缩放之前居中。 So yes, you can apply StandardScaler()
directly.所以是的,您可以直接应用StandardScaler()
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.