简体   繁体   English

如何在R中的空数据框中添加行

[英]How to add rows in an empty data frame in R

I've tried everything but I can't add rows to an empty Data Frame. 我已经尝试了所有方法,但是无法将行添加到空的数据框。 I realize that the first row its added but from the second to the end I get this error: invalid factor level, NA generated. 我意识到第一行被添加了,但是从第二行到最后我得到了这个错误:无效的因子水平,生成了NA。 I hope you can help me!! 我希望你能帮帮我!! Thank you very much for your help! 非常感谢您的帮助!

   Table <- data.frame()

   for (i in 1:length(dfMoviesList)){

       ID = paste0("DF",i)
       Value = (dfMoviesList[[i]]$TITULO[1])
       Table <- rbind(Table,c(ID,Value)) 
   }

You could set stringsAsFactors=FALSE before running the code 您可以在运行代码之前设置stringsAsFactors=FALSE

op <- options(stringsAsFactors=FALSE)
Table <- data.frame()
for (i in 1:length(dfMoviesList)){
     ID = paste0("DF",i)
    Value = (dfMoviesList[[i]]$TITUL0[1])
    Table <- rbind(Table,c(ID,Value)) 
 }
options(op) #changes to default settings

Update 更新资料

If speed is an issue, you could also try 如果速度是一个问题,您也可以尝试

ID <- paste0('DF', seq_along(dfMoviesList))
res <- data.frame(ID, Value=vapply(dfMoviesList, 
          function(x) x$TITUL0[1], numeric(1L))

data 数据

set.seed(24)
 dfMoviesList <- lapply(1:3, function(i) 
    data.frame(TITUL0= sample(1:5), val=rnorm(5)) )

Benchmarks 基准测试

set.seed(24)
dfMoviesList <- lapply(1:10000, function(i) 
        data.frame(TITUL0= sample(1:5)))

akrun1 <- function() { ID <- paste0('DF', seq_along(dfMoviesList))
                   data.frame(ID, Value=vapply(dfMoviesList, 
             function(x) x$TITUL0[1], numeric(1L)))
                    }
#included a data.table solution also
library(data.table)
akrun2 <- function() {DT <-  rbindlist(setNames(dfMoviesList, 
      paste0('DF', seq_along(dfMoviesList))), idcol=TRUE)
                     DT[DT[, .I[1L], .id]$V1]}   


dariober <- function(){
  Table<- matrix(nrow= length(dfMoviesList), ncol= 2, data= NA)

  for (i in 1:length(dfMoviesList)){
       ID<- paste0("DF",i)
       Value<- dfMoviesList[[i]]$TITUL0[1]
       Table[i,]<- c(ID, Value)
   }
 Table<- data.frame(ID= Table[,1], Value= Table[,2])

 }

 library(microbenchmark)

 microbenchmark(akrun1(), akrun2(), dariober(), times=20L, 
          unit='relative')
 #Unit: relative
 #     expr      min       lq     mean   median       uq      max neval cld
 #   akrun1() 2.214390 2.193538 2.055775 2.173440 2.148606 1.615028    20  b 
 #   akrun2() 1.000000 1.000000 1.000000 1.000000 1.000000 1.000000    20 a  
 # dariober() 3.226717 3.198742 2.984970 3.174609 3.139982 2.189399    20   c

Since you know before hand how many rows and columns you will have in your final data.frame, it's much faster to initialize an empty matrix of the right size, fill it, and convert it to data.frame. 由于您已经预先知道最终data.frame中将有多少行和列,因此初始化一个合适大小的空矩阵,将其填充并将其转换为data.frame更快。

In the for loop you propose the Table object is dropped and created at every iteration and this gets really slow even when the number of loops is not too large. 在for循环中,您建议在每次迭代时都删除并创建Table对象,即使循环数不太大,它也会变得很慢。 See for example: 参见例如:

A sample of 10000 Movies: 10000部电影的样本:

dfMoviesList <- lapply(1:10000, function(i) 
    data.frame(TITUL0= sample(1:5)))

Empty matrix startegy: 空矩阵策略:

system.time({
Table<- matrix(nrow= length(dfMoviesList), ncol= 2, data= NA)

for (i in 1:length(dfMoviesList)){
    ID<- paste0("DF",i)
    Value<- dfMoviesList[[i]]$TITUL0[1]
    Table[i,]<- c(ID, Value)
}
Table<- data.frame(ID= Table[,1], Value= Table[,2])
})
   user  system elapsed 
  0.129   0.001   0.130 

Compare to: 相比于:

system.time({
op <- options(stringsAsFactors=FALSE)
Table <- data.frame()
for (i in 1:length(dfMoviesList)){
    ID = paste0("DF",i)
    Value = (dfMoviesList[[i]]$TITUL0[1])
    Table <- rbind(Table,c(ID,Value)) 
}
options(op)
})
   user  system elapsed 
 12.316   2.855  15.180 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM