简体   繁体   中英

R - data.table sample replication

I am trying to create several random subsamples from my sample.

So, I was thinking of something like

library(data.table)
replicate(2, mtcars[, .SD[sample(.N,3)], ], simplify=F  ) 

Which gives me two lists

[[1]]
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1: 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
2: 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
3: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3

[[2]]
    mpg cyl  disp  hp drat    wt  qsec vs am gear carb
1: 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
2: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
3: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1

I wondering if it was possible to bind but with a sampling identifier , so conditional to the number of replications

    mpg cyl  disp  hp drat    wt  qsec vs am gear carb replication 
1: 32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1  1
2: 21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1  1
3: 15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3  1
4: 26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2  2
5: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1  2
6: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1  2

Something like

library(dplyr) 
replicate(2, mtcars[, .SD[sample(.N,3)], ], simplify=F  ) %>% bind_rows() 

but indicating the number of replications (of course avoiding loops )

Thanks

There is the .id option in bind_rows

replicate(2, as.data.table(mtcars)[sample(.N,3)], simplify=FALSE  ) %>% 
                                bind_rows(., .id = 'replication')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM