R: How to split a row in a dataframe into a number of rows, conditional on a value in a cell?

Question

I have a data.frame which looks like the following:

id <- c("a","a","a","a","b","b","b","b")
age_from <- c(0,2,3,7,0,1,2,6)
age_to <- c(2,3,7,10,1,2,6,10)
y <- c(100,150,100,250,300,200,100,150)
df <- data.frame(id,age_from,age_to,y)
df$years <- df$age_to - df$age_from

Which gives a df that looks like:

     id   age_from  age_to     y      years
1     a       0       2       100       2
2     a       2       3       150       1
3     a       3       7       100       4
4     a       7       10      250       3
5     b       0       1       300       1
6     b       1       2       200       1
7     b       2       6       100       4
8     b       6       10      150       4

Instead of having an unequal number of years per row, I would like to have 20 rows, 10 for each id , with each row accounting for one year. This would also involve averaging the y column across the number of years listed in the years column.

I believe this may have to be done using a loop 1:n with the n equaling a value in the years column. Although I am not sure how to start with this.

Answer 1

You can use rep to repeat the rows by the number of given years .

x <- df[rep(seq_len(nrow(df)), df$years),]
x
#    id age_from age_to         y years
#1    a        0      2  50.00000     2
#1.1  a        0      2  50.00000     2
#2    a        2      3 150.00000     1
#3    a        3      7  25.00000     4
#3.1  a        3      7  25.00000     4
#3.2  a        3      7  25.00000     4
#3.3  a        3      7  25.00000     4
#4    a        7     10  83.33333     3
#4.1  a        7     10  83.33333     3
#4.2  a        7     10  83.33333     3
#5    b        0      1 300.00000     1
#6    b        1      2 200.00000     1
#7    b        2      6  25.00000     4
#7.1  b        2      6  25.00000     4
#7.2  b        2      6  25.00000     4
#7.3  b        2      6  25.00000     4
#8    b        6     10  37.50000     4
#8.1  b        6     10  37.50000     4
#8.2  b        6     10  37.50000     4
#8.3  b        6     10  37.50000     4

When you mean with averaging the y column across the number of years to divide by the number of years:

x$y <- x$y / x$years

In case age_from should go from 0 to 9 and age_to from 1 to 10 for each id:

x$age_from <- x$age_from + ave(x$age_from, x$id, x$age_from, FUN=seq_along) - 1
#x$age_from <- ave(x$age_from, x$id, FUN=seq_along) - 1 #Alternative
x$age_to <- x$age_from + 1

Answer 2

Here is a solution with tidyr and dplyr .

First of all we complete age_from from 0 to 9 as you wanted, by keeping only the existing id s.

You will have several NA s on age_to , y and years . So, we fill them by dragging down each value in order to complete the immediately following values that are NA .

Now you can divide y by years (I assumed you meant this by setting the average value so to leave the sum consistent).

At that point, you only need to recalculate age_to accordingly.

Remember to ungroup at the end!

library(tidyr)
library(dplyr)

df %>%
  complete(id, age_from = 0:9) %>% 
    group_by(id) %>%
    fill(y, years, age_to) %>% 
    mutate(y = y/years) %>% 
    mutate(age_to = age_from + 1) %>% 
    ungroup()

# A tibble: 20 x 5
   id    age_from age_to     y years
   <chr>    <dbl>  <dbl> <dbl> <dbl>
 1 a            0      1  50       2
 2 a            1      2  50       2
 3 a            2      3 150       1
 4 a            3      4  25       4
 5 a            4      5  25       4
 6 a            5      6  25       4
 7 a            6      7  25       4
 8 a            7      8  83.3     3
 9 a            8      9  83.3     3
10 a            9     10  83.3     3
11 b            0      1 300       1
12 b            1      2 200       1
13 b            2      3  25       4
14 b            3      4  25       4
15 b            4      5  25       4
16 b            5      6  25       4
17 b            6      7  37.5     4
18 b            7      8  37.5     4
19 b            8      9  37.5     4
20 b            9     10  37.5     4

Answer 3

A tidyverse solution.

library(tidyverse)

df %>%
  mutate(age_to = age_from + 1) %>% 
  group_by(id) %>% 
  complete(nesting(age_from = 0:9, age_to = 1:10)) %>%
  fill(y, years) %>%
  mutate(y = y / years)

# A tibble: 20 x 5
# Groups:   id [2]
   id    age_from age_to     y years
   <chr>    <dbl>  <dbl> <dbl> <dbl>
 1 a            0      1  50       2
 2 a            1      2  50       2
 3 a            2      3 150       1
 4 a            3      4  25       4
 5 a            4      5  25       4
 6 a            5      6  25       4
 7 a            6      7  25       4
 8 a            7      8  83.3     3
 9 a            8      9  83.3     3
10 a            9     10  83.3     3
11 b            0      1 300       1
12 b            1      2 200       1
13 b            2      3  25       4
14 b            3      4  25       4
15 b            4      5  25       4
16 b            5      6  25       4
17 b            6      7  37.5     4
18 b            7      8  37.5     4
19 b            8      9  37.5     4
20 b            9     10  37.5     4

R: How to split a row in a dataframe into a number of rows, conditional on a value in a cell?

Question

3 answers

solution1
3 ACCPTED 2020-08-17 12:59:12

solution2
2 2020-08-17 12:51:19

solution3
1 2020-08-17 13:12:19

R: How to split a row in a dataframe into a number of rows, conditional on a value in a cell?

Question

3 answers

solution1 3 ACCPTED 2020-08-17 12:59:12

solution2 2 2020-08-17 12:51:19

solution3 1 2020-08-17 13:12:19

solution1
3 ACCPTED 2020-08-17 12:59:12

solution2
2 2020-08-17 12:51:19

solution3
1 2020-08-17 13:12:19