简体   繁体   中英

dplyr, purrr or similar process to replace for loop in R

I have a distribution of groups with a given number of individuals; for example: {2, 4, 1, 1, 2, 3}, where Group 1 has 2 individuals, Group 2 has 4 individuals, Group 3 has 1, Group 4 has 1, etc. I want to build a table where there is a unique row for each Group/Individual combination. (Desired table format at the bottom of the question).

I currently use a for loop:

num.groups <- 10
mu <- 4
sd <- 1

group.dist <- round(rnorm(num.groups, mean = mu, sd = sd))

xx <- NULL
for (i in 1:length(group.dist)) {
  temp <- data.frame(Group = i, Individual = 1:group.dist[i])
  xx <- rbind(xx, temp)
}

I'm trying to get away from using for loops in general, and the actual version of my code has hundreds of groups and I will be running a simulation thousands of times, so I'm hoping there is a more efficient method of doing this.

I apologize if someone has already asked this question, it is a difficult specific situation to google. Thanks!

The table would look like this: 在此处输入图像描述

For example:

library(tidyverse)
d <- tibble(Group = seq_along(group.dist), n = group.dist)

uncount(d, n, .id = 'Individual')
 # A tibble: 45 × 2 # Groups: Group [10] Group Individual <int> <int> 1 1 1 2 1 2 3 1 3 4 1 4 5 2 1 6 2 2 7 2 3 8 2 4 9 3 1 10 3 2 # … with 35 more rows

Here are two more approaches:

library(data.table)
data.table(Group=1:num.groups)[, .(Individual = seq(1,group.dist[.BY$Group])), by=Group]

OR:

do.call(rbind, lapply(1:num.groups, function(x) data.frame("Group" = x, Individual = 1:group.dist[x])))

Another possible solution, based on dplyr::group_modify :

library(tidyverse)

num.groups <- 10
mu <- 4
sd <- 1

group.dist <- round(rnorm(num.groups, mean = mu, sd = sd))

data.frame(Group = rep(1:num.groups, group.dist)) %>% 
  group_by(Group) %>% 
  group_modify(~ add_column(.x, Individual = 1:nrow(.x))) %>% 
  ungroup

#> # A tibble: 39 x 2
#>    Group Individual
#>    <int>      <int>
#>  1     1          1
#>  2     1          2
#>  3     1          3
#>  4     1          4
#>  5     2          1
#>  6     2          2
#>  7     2          3
#>  8     2          4
#>  9     3          1
#> 10     3          2
#> # ... with 29 more rows

Or, even better, following @Axeman's suggestion:

data.frame(Group = rep(1:num.groups, group.dist)) %>% 
  group_by(Group) %>% 
  mutate(Individual = row_number())
  ungroup

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM