简体   繁体   中英

Efficiently applying a threshold function to SciPy sparse csr_matrix

I have a SciPy csr_matrix (a vector in this case) of 1 column and x rows. In it are float values which I need to convert to the discrete class labels -1, 0 and 1. This should be done with a threshold function which maps the float values to one of these 3 class labels.

Is there no way other than iterating over the elements as described in Iterating through a scipy.sparse vector (or matrix) ? I would love to have some elegant way to just somehow map(thresholdfunc()) on all elements.

Note that while it is of type csr_matrix, it isn't actually sparse as it's just the return of another function where a sparse matrix was involved.

If you have an array, you can discretize based on some condition with the np.where function. eg:

>>> import numpy as np
>>> x = np.arange(10)
>>> np.where(x < 5, 0, 1)
array([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])

The syntax is np.where(BOOLEAN_ARRAY, VALUE_IF_TRUE, VALUE_IF_FALSE) . You can chain together two where statements to have multiple conditions:

>>> np.where(x < 3, -1, np.where(x > 6, 0, 1))
array([-1, -1, -1,  1,  1,  1,  1,  0,  0,  0])

To apply this to your data in the CSR or CSC sparse matrix, you can use the .data attribute, which gives you access to the internal array containing all the nonzero entries in the sparse matrix. For example:

>>> from scipy import sparse
>>> mat = sparse.csr_matrix(x.reshape(10, 1))
>>> mat.data = np.where(mat.data < 3, -1, np.where(mat.data > 6, 0, 1))
>>> mat.toarray()
array([[ 0],
       [-1],
       [-1],
       [ 1],
       [ 1],
       [ 1],
       [ 1],
       [ 0],
       [ 0],
       [ 0]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM