Transforming Dataset into value matrix

Question

Sorry about the hopeless title..

I have a dataset that looks like:

|userId|movieId|rating|genre1|genre2|
|1     |13     |3.5   |1     |0     |
|1     |412    |2.5   |1     |1     |
|2     |4      |3.0   |0     |1     |
|3     |412    |2.5   |1     |1     |
|4     |13     |4.5   |1     |0     |
|4     |412    |5     |1     |1     |

And so on...

Not every user has rated every movie.

I want to transform this into a matrix that looks like:

|   |1  |2  |3  |4  |
|4  |   |3  |   |   |
|13 |2.5|   |   |4.5|
|412|   |   |   |5  |

So I have userId as the columns and movieId as the rows with the associated value being the rating given.

What's the best way of doing this?

Edit: The id's are non-sequential. There are 140k users and 28k movies.

Answer 1

If you have several users and several movies, you could easily run out of memory in building a matrix . For instance say that users are 1000 and the different movies are 1000. You'll end up with a matrix containing 1M entries, most of them will be missing (since not every users saw every movie).

If your dataset is big, a sparseMatrix from the Matrix package is the way to go. If both users and movies id's are sequential (ie they start with 1 and finish with the number of different entries), building it is straightforward. Using @StevenBeaupré data :

require(Matrix)
mat<-sparseMatrix(df$userId,df$movieId,x=df$rating)

If the id's are not sequential:

mat<-sparseMatrix(as.integer(factor(df$userId)), 
                  as.integer(factor(df$movieId)),x=df$rating)

You can basically perform any matrix operation on a sparseMatrix too.

Answer 2

Try

library(dplyr)
library(tidyr)

df %>%
  select(-(genre1:genre2)) %>%
  spread(userId, rating, fill = "")

Which gives:

#  movieId   1 2   3   4
#1       4     3        
#2      13 3.5       4.5
#3     412 2.5   2.5   5

Data

df <- structure(list(userId = c(1L, 1L, 2L, 3L, 4L, 4L), movieId = c(13L, 
412L, 4L, 412L, 13L, 412L), rating = c(3.5, 2.5, 3, 2.5, 4.5, 
5), genre1 = c(1L, 1L, 0L, 1L, 1L, 1L), genre2 = c(0L, 1L, 1L, 
1L, 0L, 1L)), .Names = c("userId", "movieId", "rating", "genre1", 
"genre2"), class = "data.frame", row.names = c(NA, -6L))

Transforming Dataset into value matrix

Question

2 answers

solution1
5 ACCPTED 2015-10-31 23:30:59

solution2
2 2015-10-31 23:16:54

Transforming Dataset into value matrix

Question

2 answers

solution1 5 ACCPTED 2015-10-31 23:30:59

solution2 2 2015-10-31 23:16:54

solution1
5 ACCPTED 2015-10-31 23:30:59

solution2
2 2015-10-31 23:16:54