简体   繁体   English

R:如何将 dataframe 中的一行拆分为多行,以单元格中的值为条件?

[英]R: How to split a row in a dataframe into a number of rows, conditional on a value in a cell?

I have a data.frame which looks like the following:我有一个如下所示的data.frame

id <- c("a","a","a","a","b","b","b","b")
age_from <- c(0,2,3,7,0,1,2,6)
age_to <- c(2,3,7,10,1,2,6,10)
y <- c(100,150,100,250,300,200,100,150)
df <- data.frame(id,age_from,age_to,y)
df$years <- df$age_to - df$age_from

Which gives a df that looks like:这给出了一个看起来像这样的df

     id   age_from  age_to     y      years
1     a       0       2       100       2
2     a       2       3       150       1
3     a       3       7       100       4
4     a       7       10      250       3
5     b       0       1       300       1
6     b       1       2       200       1
7     b       2       6       100       4
8     b       6       10      150       4

Instead of having an unequal number of years per row, I would like to have 20 rows, 10 for each id , with each row accounting for one year.我不想每行有不相等的年数,而是有 20 行,每个id 10 行,每行占一年。 This would also involve averaging the y column across the number of years listed in the years column.这还涉及对年列中列出的years数的y列进行平均。

I believe this may have to be done using a loop 1:n with the n equaling a value in the years column.我相信这可能必须使用循环1:n来完成,其中n等于years列中的值。 Although I am not sure how to start with this.虽然我不确定如何开始。

You can use rep to repeat the rows by the number of given years .您可以使用rep按给定数重复行。

x <- df[rep(seq_len(nrow(df)), df$years),]
x
#    id age_from age_to         y years
#1    a        0      2  50.00000     2
#1.1  a        0      2  50.00000     2
#2    a        2      3 150.00000     1
#3    a        3      7  25.00000     4
#3.1  a        3      7  25.00000     4
#3.2  a        3      7  25.00000     4
#3.3  a        3      7  25.00000     4
#4    a        7     10  83.33333     3
#4.1  a        7     10  83.33333     3
#4.2  a        7     10  83.33333     3
#5    b        0      1 300.00000     1
#6    b        1      2 200.00000     1
#7    b        2      6  25.00000     4
#7.1  b        2      6  25.00000     4
#7.2  b        2      6  25.00000     4
#7.3  b        2      6  25.00000     4
#8    b        6     10  37.50000     4
#8.1  b        6     10  37.50000     4
#8.2  b        6     10  37.50000     4
#8.3  b        6     10  37.50000     4

When you mean with averaging the y column across the number of years to divide by the number of years:当您的意思是将 y 列平均跨年数除以年数时:

x$y <- x$y / x$years

In case age_from should go from 0 to 9 and age_to from 1 to 10 for each id:如果age_from应该 go 从09age_to110对于每个 id:

x$age_from <- x$age_from + ave(x$age_from, x$id, x$age_from, FUN=seq_along) - 1
#x$age_from <- ave(x$age_from, x$id, FUN=seq_along) - 1 #Alternative
x$age_to <- x$age_from + 1

Here is a solution with tidyr and dplyr .这是tidyrdplyr的解决方案。

First of all we complete age_from from 0 to 9 as you wanted, by keeping only the existing id s.首先,我们通过仅保留现有的idcomplete从 0 到 9 的age_from

You will have several NA s on age_to , y and years .您将在age_toyyears上有几个NA So, we fill them by dragging down each value in order to complete the immediately following values that are NA .因此,我们通过向下拖动每个值来填充它们,以完成紧随其后的NA值。

Now you can divide y by years (I assumed you meant this by setting the average value so to leave the sum consistent).现在您可以将y除以years (我假设您的意思是设置平均值以使总和保持一致)。

At that point, you only need to recalculate age_to accordingly.此时,您只需要相应地重新计算age_to

Remember to ungroup at the end!最后记得ungroup

library(tidyr)
library(dplyr)

df %>%
  complete(id, age_from = 0:9) %>% 
    group_by(id) %>%
    fill(y, years, age_to) %>% 
    mutate(y = y/years) %>% 
    mutate(age_to = age_from + 1) %>% 
    ungroup()
# A tibble: 20 x 5
   id    age_from age_to     y years
   <chr>    <dbl>  <dbl> <dbl> <dbl>
 1 a            0      1  50       2
 2 a            1      2  50       2
 3 a            2      3 150       1
 4 a            3      4  25       4
 5 a            4      5  25       4
 6 a            5      6  25       4
 7 a            6      7  25       4
 8 a            7      8  83.3     3
 9 a            8      9  83.3     3
10 a            9     10  83.3     3
11 b            0      1 300       1
12 b            1      2 200       1
13 b            2      3  25       4
14 b            3      4  25       4
15 b            4      5  25       4
16 b            5      6  25       4
17 b            6      7  37.5     4
18 b            7      8  37.5     4
19 b            8      9  37.5     4
20 b            9     10  37.5     4

A tidyverse solution.一个tidyverse的解决方案。

library(tidyverse)

df %>%
  mutate(age_to = age_from + 1) %>% 
  group_by(id) %>% 
  complete(nesting(age_from = 0:9, age_to = 1:10)) %>%
  fill(y, years) %>%
  mutate(y = y / years)

# A tibble: 20 x 5
# Groups:   id [2]
   id    age_from age_to     y years
   <chr>    <dbl>  <dbl> <dbl> <dbl>
 1 a            0      1  50       2
 2 a            1      2  50       2
 3 a            2      3 150       1
 4 a            3      4  25       4
 5 a            4      5  25       4
 6 a            5      6  25       4
 7 a            6      7  25       4
 8 a            7      8  83.3     3
 9 a            8      9  83.3     3
10 a            9     10  83.3     3
11 b            0      1 300       1
12 b            1      2 200       1
13 b            2      3  25       4
14 b            3      4  25       4
15 b            4      5  25       4
16 b            5      6  25       4
17 b            6      7  37.5     4
18 b            7      8  37.5     4
19 b            8      9  37.5     4
20 b            9     10  37.5     4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM