[英]How to add rows in an empty data frame in R
I've tried everything but I can't add rows to an empty Data Frame. 我已经尝试了所有方法,但是无法将行添加到空的数据框。 I realize that the first row its added but from the second to the end I get this error: invalid factor level, NA generated.
我意识到第一行被添加了,但是从第二行到最后我得到了这个错误:无效的因子水平,生成了NA。 I hope you can help me!!
我希望你能帮帮我!! Thank you very much for your help!
非常感谢您的帮助!
Table <- data.frame()
for (i in 1:length(dfMoviesList)){
ID = paste0("DF",i)
Value = (dfMoviesList[[i]]$TITULO[1])
Table <- rbind(Table,c(ID,Value))
}
You could set stringsAsFactors=FALSE
before running the code 您可以在运行代码之前设置
stringsAsFactors=FALSE
op <- options(stringsAsFactors=FALSE)
Table <- data.frame()
for (i in 1:length(dfMoviesList)){
ID = paste0("DF",i)
Value = (dfMoviesList[[i]]$TITUL0[1])
Table <- rbind(Table,c(ID,Value))
}
options(op) #changes to default settings
If speed is an issue, you could also try 如果速度是一个问题,您也可以尝试
ID <- paste0('DF', seq_along(dfMoviesList))
res <- data.frame(ID, Value=vapply(dfMoviesList,
function(x) x$TITUL0[1], numeric(1L))
set.seed(24)
dfMoviesList <- lapply(1:3, function(i)
data.frame(TITUL0= sample(1:5), val=rnorm(5)) )
set.seed(24)
dfMoviesList <- lapply(1:10000, function(i)
data.frame(TITUL0= sample(1:5)))
akrun1 <- function() { ID <- paste0('DF', seq_along(dfMoviesList))
data.frame(ID, Value=vapply(dfMoviesList,
function(x) x$TITUL0[1], numeric(1L)))
}
#included a data.table solution also
library(data.table)
akrun2 <- function() {DT <- rbindlist(setNames(dfMoviesList,
paste0('DF', seq_along(dfMoviesList))), idcol=TRUE)
DT[DT[, .I[1L], .id]$V1]}
dariober <- function(){
Table<- matrix(nrow= length(dfMoviesList), ncol= 2, data= NA)
for (i in 1:length(dfMoviesList)){
ID<- paste0("DF",i)
Value<- dfMoviesList[[i]]$TITUL0[1]
Table[i,]<- c(ID, Value)
}
Table<- data.frame(ID= Table[,1], Value= Table[,2])
}
library(microbenchmark)
microbenchmark(akrun1(), akrun2(), dariober(), times=20L,
unit='relative')
#Unit: relative
# expr min lq mean median uq max neval cld
# akrun1() 2.214390 2.193538 2.055775 2.173440 2.148606 1.615028 20 b
# akrun2() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000 20 a
# dariober() 3.226717 3.198742 2.984970 3.174609 3.139982 2.189399 20 c
Since you know before hand how many rows and columns you will have in your final data.frame, it's much faster to initialize an empty matrix of the right size, fill it, and convert it to data.frame. 由于您已经预先知道最终data.frame中将有多少行和列,因此初始化一个合适大小的空矩阵,将其填充并将其转换为data.frame更快。
In the for loop you propose the Table object is dropped and created at every iteration and this gets really slow even when the number of loops is not too large. 在for循环中,您建议在每次迭代时都删除并创建Table对象,即使循环数不太大,它也会变得很慢。 See for example:
参见例如:
A sample of 10000 Movies: 10000部电影的样本:
dfMoviesList <- lapply(1:10000, function(i)
data.frame(TITUL0= sample(1:5)))
Empty matrix startegy: 空矩阵策略:
system.time({
Table<- matrix(nrow= length(dfMoviesList), ncol= 2, data= NA)
for (i in 1:length(dfMoviesList)){
ID<- paste0("DF",i)
Value<- dfMoviesList[[i]]$TITUL0[1]
Table[i,]<- c(ID, Value)
}
Table<- data.frame(ID= Table[,1], Value= Table[,2])
})
user system elapsed
0.129 0.001 0.130
Compare to: 相比于:
system.time({
op <- options(stringsAsFactors=FALSE)
Table <- data.frame()
for (i in 1:length(dfMoviesList)){
ID = paste0("DF",i)
Value = (dfMoviesList[[i]]$TITUL0[1])
Table <- rbind(Table,c(ID,Value))
}
options(op)
})
user system elapsed
12.316 2.855 15.180
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.