I am creating a matrix from a Pandas dataframe as follows:
dense_matrix = np.array(df.as_matrix(columns = None), dtype=bool).astype(np.int)
And then into a sparse matrix with:
sparse_matrix = scipy.sparse.csr_matrix(dense_matrix)
Is there any way to go from a df straight to a sparse matrix?
Thanks in advance.
df.values
is a numpy array, and accessing values that way is always faster than np.array
.
scipy.sparse.csr_matrix(df.values)
You might need to take the transpose first, like df.values.T
. In DataFrames, the columns are axis 0.
有一种方法可以做到这一点,而无需在途中转换为密集: csr_sparse_matrix = df.sparse.to_coo().tocsr()
Solution:
import pandas as pd
import scipy
from scipy.sparse import csr_matrix
csr_matrix = csr_matrix(df.astype(pd.SparseDtype("float64",0)).sparse.to_coo())
Explanation:
to_coo
needs the pd.DataFrame
to be in a sparse format, so the dataframe will need to be converted to a sparse datatype: df.astype(pd.SparseDtype("float64",0))
After it is converted to a COO matrix, it can be converted to a CSR matrix.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.