简体   繁体   中英

Scaling issues with scipy.sparse matrix while using scikit

While solving a machine learning problem using scikit (python) I need to do scaling of scipy.sparse matrix before doing the training using SVM in order to achieve higher accuracy. But its clearly mentioned here , that:

scale and StandardScaler accept scipy.sparse matrices as input only when with_mean=False is explicitly passed to the constructor. Otherwise a ValueError will be raised as silently centering would break the sparsity and would often crash the execution by allocating excessive amounts of memory unintentionally.

This means that I cannot have zero mean with this. So how do I scale this sparse matrix so that it has zero mean too along with unit variance. I also need to store this 'scaling' so that I can use the same transformation on the test matrix to scale it as well.

If the matrix is small, you can densify it with X.toarray() . If the matrix is large, then this will probably blow your RAM.

As an alternative to mean-centering and scaling, you can try per-sample normalization with sklearn.preprocessing.Normalizer ; this is appropriate for frequency features (eg in text classification).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM