简体   繁体   中英

How to create a ratings csr_matrix in scipy?

I have a csv file in this format :

userId  movieId rating  timestamp
1     31      2.5   1260759144
2     10      4     835355493
3     1197    5     1298932770
4     10      4     949810645

I want to construct a sparse matrix with rows as userId and columns as movieID. I have stored all the data as a dictionary named "column" where column['user'] contains user IDs, column['movie'] has movie IDs, and column['ratings'] has ratings as follows:

f = open('ratings.csv','rb')
reader = csv.reader(f)
headers = ['user','movie','rating','timestamp']
column = {}
for h in headers:
    column[h] = []
for row in reader:
    for h, v in zip(headers, row):
        column[h].append(float(v))

When I call the sparse matrix function as :

mat = scipy.sparse.csr_matrix((column['rating'],(column['user'],column['movie'])))

I get "TypeError: invalid shape"

Please help

scipy.sparse.csr_matrix([column['rating'],column['user'],column['movie']])

You had a tuple consisting of a 1xn dimensional list, and a 2xn dimensional list which will not work.

PS: For the reading of the data, you should try Pandas :-) ( http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html ). Minimal example:

import pandas as pd

# Setup a dataframe from the CSV and make it sparse
df = pd.read_csv('ratings.csv')
df = df.to_sparse(fill_value=0)
print(df.head())

check this way :

df = pd.read_csv('f:\\train.csv', usecols=[0, 1, 2], names=['userId ', 
                   'movieID', 'ratings'], skiprows=1)
from scipy.sparse import csr_matrix
utility_csr = csr_matrix((df.ratings, (df.userId , df.movieID)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM