在R中的单个数据框中组合数据框作为函数输出

Question

I would like to combine multiple dataframes, as output of a function, into one big dataframe in R. 我想将多个数据框（作为一个函数的输出）组合成R中的一个大数据框。

I am simulating data within a function, eg: 我正在模拟一个函数中的数据，例如：

set.seed(123)

x <- function(){
return( data.frame( matrix(rnorm(10, 1, .5), ncol=2) ) )
}

I would like to run multiple simulations and tie the dataframes together. 我想运行多个模拟并将数据框捆绑在一起。

Attempt 尝试

set.seed(123)

x_improved <- function(sim_nr){
  df <- data.frame( matrix(rnorm(10, 1, .5), ncol=2) )  # simulate data
  sim_nr <- rep(sim_nr, length(df[,1])).                # add reference number
  df <- cbind(df, sim_nr)                               # bind columns
  return(df)
}

list_dataframes <- lapply(c(1,2,3), x_improved)         # create list of dataframes

df <- do.call("rbind", list_dataframes)                 # convert list to dataframe

The code above does so, see "Expected output" below. 上面的代码这样做，请参见下面的“预期输出”。

Expected output: 预期产量：

> df
          X1        X2 sim_nr
1  0.4660881 0.1566533      1
2  0.8910125 1.4188935      1
3  0.4869978 1.0766866      1
4  0.6355544 0.4309315      1
5  0.6874804 1.6269075      1
6  1.2132321 1.3443201      2
7  0.8524643 1.2769588      2
8  1.4475628 0.9690441      2
9  1.4390667 0.8470187      2
10 1.4107905 0.8097645      2
11 0.6526465 0.4384457      3
12 0.8960414 0.7985576      3
13 0.3673018 0.7666723      3
14 2.0844780 1.3899826      3
15 1.6039810 0.9583155      3

Question : 问题：

Is this the proper (or R) way to address this problem? 这是解决此问题的正确方法吗？ Are there more efficient (or convenient) solutions? 是否有更有效（或更方便）的解决方案？

Answer 1

Another approach would be to use an array which can be more performant if you need to do a lot of grouping operations. 另一种方法是使用一个array ，如果您需要执行很多分组操作，则可以提高性能。

set.seed(123)
replicate(3, matrix(rnorm(10, 1, 0.5), ncol = 2))
, , 1

          [,1]      [,2]
[1,] 0.7197622 1.8575325
[2,] 0.8849113 1.2304581
[3,] 1.7793542 0.3674694
[4,] 1.0352542 0.6565736
[5,] 1.0646439 0.7771690

, , 2

          [,1]       [,2]
[1,] 1.6120409 1.89345657
[2,] 1.1799069 1.24892524
[3,] 1.2003857 0.01669142
[4,] 1.0553414 1.35067795
[5,] 0.7220794 0.76360430

, , 3

          [,1]      [,2]
[1,] 0.4660881 0.1566533
[2,] 0.8910125 1.4188935
[3,] 0.4869978 1.0766866
[4,] 0.6355544 0.4309315
[5,] 0.6874804 1.6269075

Or, if you want a data.frame , it's oftentimes faster to do all of your rnorm simulations at once. 或者，如果您需要data.frame ，通常一次进行所有rnorm仿真通常会更快。 Note that even with the seed set that this isn't an exact match - the matrix fills up by the column so the ordering is slightly different. 请注意，即使是种子集，也不完全匹配-矩阵被列填充，因此顺序略有不同。

set.seed(123)
nsim <- 3
data.frame(matrix(rnorm(10 * n_sim, 1, 0.5), ncol = 2),
           sim_nr = rep(seq_len(n_sim), each = 5)
  )

Answer 2

One way to improve at least by number of lines would be to use transform and the function x_improved becomes one-liner 至少通过行数改进的一种方法是使用transform ，并且函数x_improved变为x_improved

set.seed(123)
x_improved <- function(sim_nr){
   transform(data.frame(matrix(rnorm(10, 1,.5), ncol=2), sim_nr = sim_nr))
}

do.call(rbind, lapply(1:3, x_improved))


#          X1         X2 sim_nr
#1  0.7197622 1.85753249      1
#2  0.8849113 1.23045810      1
#3  1.7793542 0.36746938      1
#4  1.0352542 0.65657357      1
#5  1.0646439 0.77716901      1
#6  1.6120409 1.89345657      2
#7  1.1799069 1.24892524      2
#8  1.2003857 0.01669142      2
#9  1.0553414 1.35067795      2
#10 0.7220794 0.76360430      2
#11 0.4660881 0.15665334      3
#12 0.8910125 1.41889352      3
#13 0.4869978 1.07668656      3
#14 0.6355544 0.43093153      3
#15 0.6874804 1.62690746      3

Or depending on your use-case you could construct the dataframe all together. 或者根据您的用例，您可以一起构造数据框。

num <- 1:3
transform(data.frame(matrix(rnorm(10 * length(num), 1,.5), ncol=2)), 
          sim_nr = rep(num, each = 10/2))

Answer 3

Using purrr library 使用purrr库

purrr::map_df(c(1,2,3), ~data.frame(matrix(rnorm(10, 1, .5), ncol=2)), .id='sim_nr') 
#Using the x function it would be 
purrr::map_df(c(1,2,3), ~x() , .id='sim_nr')

Answer 4

The simplest solution is to use rbindlist from the data.table library: 最简单的解决方案是使用rbindlist从data.table库：

> library(data.table)
> rbindlist(list_dataframes)

You can of course do it for your list_dataframes either outside or inside of the "for" loop. 您当然可以在“ for”循环的外部或内部对list_dataframes进行操作。

在R中的单个数据框中组合数据框作为函数输出

问题描述

4 个解决方案

解决方案1
3 2019-08-21 11:36:00

解决方案2
2 2019-08-21 08:50:59

解决方案3
1 已采纳 2019-08-21 08:54:06

解决方案4
0 2019-08-21 09:05:24

在R中的单个数据框中组合数据框作为函数输出

问题描述

4 个解决方案

解决方案1 3 2019-08-21 11:36:00

解决方案2 2 2019-08-21 08:50:59

解决方案3 1 已采纳 2019-08-21 08:54:06

解决方案4 0 2019-08-21 09:05:24

解决方案1
3 2019-08-21 11:36:00

解决方案2
2 2019-08-21 08:50:59

解决方案3
1 已采纳 2019-08-21 08:54:06

解决方案4
0 2019-08-21 09:05:24