I am trying to extract a strings of the movie type from a data set. The data is in the following format where the genre types are randomly distributed in the dataset by different reviewers.Luckily there are only 4 genre types (comedy, action, horror, scifi) in the dataset, but there are also repetitions. So I need to extract those strings from the dataset.
id movie v1 v2 v3 v4 v5 v6
1 LTR comedy highbudget action comedy jj horror
2 MI newmovie fiction scifi funny xx jhee
I am expecting an output of the following form.
id movie genretype1 genretype2 genretype3 genretype4
1 LTR comedy action comedy horror
2 MI scifi --- --- ---
Any suggestions?
This is how I would do it - it makes more sense to use a list, not a data.frame
> types = c("comedy", "action", "horror", "scifi")
> List = apply(df, 1, function(x) types[types %in% x[-c(1, 2)]])
> names(List) <- df$movie
> List
$LTR
[1] "comedy" "action" "horror"
$MI
[1] "scifi"
Alternatively, this solution could give you a tidy data.frame:
> Matrix = t(apply(df, 1, function(x) types %in% x[-c(1, 2)]))
> colnames(Matrix) = types
> cbind(df[,1:2], Matrix)
id movie comedy action horror scifi
1 1 LTR TRUE TRUE TRUE FALSE
2 2 MI FALSE FALSE FALSE TRUE
We can match
the 'types' with each row of 'df1' excluding the 1st two identifier columns. The length of list
elements in the 'lst1' may not be the same. We make the length equal by padding NA
values to elements that have shorter length than the maximum length element, rbind
the list elements and create a new data.frame
.
types <- c("comedy", "action", "horror", "scifi")
lst1 <- apply(df1[-(1:2)], 1, function(x)
types[match(x, types, nomatch=0)])
res <- data.frame(df1[1:2], do.call(rbind, lapply(lst1,
'length<-', max(lengths(lst1)))))
res
# id movie X1 X2 X3 X4
#1 1 LTR comedy action comedy horror
#2 2 MI scifi <NA> <NA> <NA>
NOTE: We can change the column names if it is needed.
colnames(res)[-(1:2)] <- paste0('genretype', 1:4)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.