简体   繁体   中英

Converting Pandas DataFrame to sparse matrix

Here is my code:

data=pd.get_dummies(data['movie_id']).groupby(data['user_id']).apply(max)

df=pd.DataFrame(data)

replace=df.replace(0,np.NaN)

t=replace.fillna(-1)

sparse=sp.csr_matrix(t.values)

My data consist of two columns which are movie_id and user_id.

 user_id      movie_id

   5             1000 

   6             1007 

I want to convert the data to a sparse matrix. I first created an interaction matrix where rows indicate user_id and columns indicate movie_id with positive interaction as +1 and negative interaction as -1. Then I converted it to a sparse matrix using scipy. My result looks like this:

(0,0) -1

(0,1) -1

(0,2) 1

but what actually i want is this:

(1000,0) -1

(1000,1) 1

(1007,0) -1

Any help would be appreciated.

If you have both the row and column index (in your case movie_id and user_id , respectively), it is advisable to use the COO format for creation.

You can convert it into a sparse format like so:

import scipy
sparse_mat = scipy.sparse.coo_matrix((t.values, (df.movie_id, df.user_id)))

Importantly, note how the constructor gives the implicit shape of the sparse matrix by passing both the movie ID and user ID as arguments for the data.
Furthermore, you can convert this matrix to any other sparse format you desire, as for example CSR.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM