[英]how to loop through columns and then add them to a dataframe
So the problem that I am working on requires me to go through a large data frame of actors/actresses and create a profile for them based on the movies that they have already acted in. I created this profile function which can be used for a single actor/actress, but I am having trouble looping it through the entire data frame to get a profile for all of the actors/actresses. 因此,我要解决的问题是我需要遍历演员/女演员的大数据框,并根据他们已经演过的电影为他们创建一个个人资料。我创建了此个人资料功能,该功能可用于单个演员,但是我很难在整个数据帧中循环获取所有演员的资料。
profile <- function(actor, df) {
dftest <- subset(df, df$id.x == actor)
fit <- rpart(name.y ~ id.x, method = "class", dftest)
p <- predict(fit, dftest)
return(colnames(p)[max.col(p,ties.method="first")][1])
}
Here is the for loop that I have already, but I keep getting errors and am unsure if I am going about this method correctly. 这是我已经拥有的for循环,但是我不断收到错误,不确定我是否正确使用此方法。
for (k in c(1)) {
user.frame <- data.frame()
for (i in df2$id.x) { # df2$id.x is the column of actors names
user.frame[i] <- data.frame(profile(i, df2))
}
df2final <- rbind(final, user.frame)
View(df2fin)
}
** Edit ** **编辑**
This is the data that is going into the profile function 这是进入配置文件功能的数据
# A tibble: 278,361 x 7
movie_id title id.x name.x id.y id1 name.y
<int> <chr> <int> <chr> <int> <int> <chr>
1 19995 Avatar 65731 Sam Worthington 19995 28 Action
2 19995 Avatar 65731 Sam Worthington 19995 12 Adventure
3 19995 Avatar 65731 Sam Worthington 19995 14 Fantasy
4 19995 Avatar 65731 Sam Worthington 19995 878 Science Fiction
5 19995 Avatar 8691 Zoe Saldana 19995 28 Action
6 19995 Avatar 8691 Zoe Saldana 19995 12 Adventure
7 19995 Avatar 8691 Zoe Saldana 19995 14 Fantasy
8 19995 Avatar 8691 Zoe Saldana 19995 878 Science Fiction
9 19995 Avatar 10205 Sigourney Weaver 19995 28 Action
10 19995 Avatar 10205 Sigourney Weaver 19995 12 Adventure
Ideally, I would like the for loop to give me a data frame at the end that has the actor ID in one column and the profile genre next to it. 理想情况下,我希望for循环在结尾为我提供一个数据帧,该数据帧的一列有actor ID,旁边是个人档案类型。 The error I keep getting is this 我一直得到的错误是
Error in `[<-.data.frame`(`*tmp*`, i, value = list(profile.i..df2. = 1L)) :
new columns would leave holes after existing columns
So i used your code and did this: 所以我用了你的代码并做到了:
df <- data.frame(a = rep(df2$id.x))
df$b <- df2$name.y
actor.names <- unique(df$a)
result.file <- matrix(ncol = 2, nrow = length(actor.names))
for(i in 1:length(actor.names)){
dftest <- subset(df, a == actor.names[i]) #subset actor name
fit <- rpart::rpart(b ~ a, method = "class", dftest) #run model
p <- predict(fit, dftest) #predict genre
temp <- colnames(p)[max.col(p,ties.method="first")][1]
result.file[i,1] <- actor.names[i]
result.file[i,2] <- temp
}
It gave me this error: 它给了我这个错误:
Error in cbind(yval2, yprob, nodeprob) :
number of rows of matrices must match (see arg 2)
But the results that I got were what i needed. 但是我得到的结果正是我所需要的。 Should i be worried? 我应该担心吗?
result.file
[,1] [,2]
[1,] "65731" "Action"
[2,] "8691" "Action"
[3,] "10205" "Comedy"
[4,] "32747" "Action"
[5,] "17647" "Action"
[6,] "1771" "Drama"
[7,] "59231" "Comedy"
[8,] "30485" "Action"
[9,] "15853" "Adventure"
[10,] "10964" "Drama" [10,]“ 10964”“戏剧”
Here is the dput(head(df)) 这是dput(head(df))
dput(head(df))
structure(list(a = c(65731L, 65731L, 65731L, 65731L, 8691L, 8691L
), b = c("Action", "Adventure", "Fantasy", "Science Fiction",
"Action", "Adventure")), .Names = c("a", "b"), row.names = c(NA,
6L), class = "data.frame")
I think you want something like this: 我想你想要这样的东西:
#generate data (4 actors, each with 2x Genre1 and 1x Genre3)
df <- data.frame(a = rep(c("Actor1","Actor2","Actor3","Actor4"),each=3),
b = rep(c("Genre1","Genre1","Genre3"),4),
stringsAsFactors = F)
#create a vector with actor names
actor.names <- unique(df$a)
#create a storage matrix for the results
result.file <- matrix(ncol = 2,
nrow = length(actor.names))
for(i in 1:length(actor.names)){
dftest <- subset(df, a == actor.names[i]) #subset actor name
fit <- rpart::rpart(b ~ a, method = "class", dftest) #run model
p <- predict(fit, dftest) #predict genre
temp <- colnames(p)[max.col(p,ties.method="first")][1]
result.file[i,1] <- actor.names[i]
result.file[i,2] <- temp
}
result.file
[,1] [,2]
[1,] "Actor1" "Genre1"
[2,] "Actor2" "Genre1"
[3,] "Actor3" "Genre1"
[4,] "Actor4" "Genre1"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.