Sorry about the hopeless title..
I have a dataset that looks like:
|userId|movieId|rating|genre1|genre2|
|1 |13 |3.5 |1 |0 |
|1 |412 |2.5 |1 |1 |
|2 |4 |3.0 |0 |1 |
|3 |412 |2.5 |1 |1 |
|4 |13 |4.5 |1 |0 |
|4 |412 |5 |1 |1 |
And so on...
Not every user has rated every movie.
I want to transform this into a matrix that looks like:
| |1 |2 |3 |4 |
|4 | |3 | | |
|13 |2.5| | |4.5|
|412| | | |5 |
So I have userId as the columns and movieId as the rows with the associated value being the rating given.
What's the best way of doing this?
Edit: The id's are non-sequential. There are 140k users and 28k movies.
If you have several users and several movies, you could easily run out of memory in building a matrix
. For instance say that users are 1000 and the different movies are 1000. You'll end up with a matrix
containing 1M entries, most of them will be missing (since not every users saw every movie).
If your dataset is big, a sparseMatrix
from the Matrix
package is the way to go. If both users and movies id's are sequential (ie they start with 1 and finish with the number of different entries), building it is straightforward. Using @StevenBeaupré data
:
require(Matrix)
mat<-sparseMatrix(df$userId,df$movieId,x=df$rating)
If the id's are not sequential:
mat<-sparseMatrix(as.integer(factor(df$userId)),
as.integer(factor(df$movieId)),x=df$rating)
You can basically perform any matrix
operation on a sparseMatrix
too.
Try
library(dplyr)
library(tidyr)
df %>%
select(-(genre1:genre2)) %>%
spread(userId, rating, fill = "")
Which gives:
# movieId 1 2 3 4
#1 4 3
#2 13 3.5 4.5
#3 412 2.5 2.5 5
Data
df <- structure(list(userId = c(1L, 1L, 2L, 3L, 4L, 4L), movieId = c(13L,
412L, 4L, 412L, 13L, 412L), rating = c(3.5, 2.5, 3, 2.5, 4.5,
5), genre1 = c(1L, 1L, 0L, 1L, 1L, 1L), genre2 = c(0L, 1L, 1L,
1L, 0L, 1L)), .Names = c("userId", "movieId", "rating", "genre1",
"genre2"), class = "data.frame", row.names = c(NA, -6L))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.