如何根据r中的另一列填写缺失的一列

Question

I have a subset of data frame as below.我有一个数据框的子集，如下所示。 I want to fill the NAs in column "age at disease" so that the age of one individual with disease be same as the sibling (identified from familyID) without disease.我想在“患病年龄”列中填写 NA，以便一个患有疾病的人的年龄与没有疾病的兄弟姐妹（从 familyID 识别）相同。

structure(list(id = c(1, 2, 3, 4, 5, 6), 
           familyId = c(1, 1, 2, 2, 3, 3), 
           disease = c(1, 0, 0, 1, 1, 0), 
           `age at disease` = c("40","NA", "NA", "43", "52", "NA")), 
      class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

which means that the last column "age at disease" should be: c(40,40,43,43,52,52).这意味着最后一列“患病年龄”应该是：c(40,40,43,43,52,52)。

Answer 1

You can use the following code:您可以使用以下代码：

library(dplyr)
library(tidyr)
df %>%
  na_if("NA") %>%
  group_by(familyId) %>%
  fill(`age at disease`) %>%
  fill(`age at disease`, .direction = "up")

Output:输出：

# A tibble: 6 × 4
# Groups:   familyId [3]
     id familyId disease `age at disease`
  <dbl>    <dbl>   <dbl> <chr>           
1     1        1       1 40              
2     2        1       0 40              
3     3        2       0 43              
4     4        2       1 43              
5     5        3       1 52              
6     6        3       0 52

Answer 2

If there is only a single non-NA element per group, we may also do如果每组只有一个非 NA 元素，我们也可以这样做

library(dplyr)
df1 %>%
   type.convert(as.is = TRUE) %>%
   group_by(familyId) %>%
   mutate(`age at disease` = `age at disease`[complete.cases(`age at disease`)][1]) %>% 
   ungroup

-output -输出

# A tibble: 6 × 4
     id familyId disease `age at disease`
  <dbl>    <dbl>   <dbl> <chr>           
1     1        1       1 40              
2     2        1       0 40              
3     3        2       0 43              
4     4        2       1 43              
5     5        3       1 52              
6     6        3       0 52

Answer 3

Here is another dplyr approach:这是另一种dplyr方法：

df %>%
  group_by(familyId) %>% 
  arrange(`age at disease`,.by_group = TRUE) %>% 
  mutate(`age at disease` = first(`age at disease`))

     id familyId disease `age at disease`
  <dbl>    <dbl>   <dbl> <chr>           
1     1        1       1 40              
2     2        1       0 40              
3     4        2       1 43              
4     3        2       0 43              
5     5        3       1 52              
6     6        3       0 52

如何根据r中的另一列填写缺失的一列

问题描述

3 个解决方案

解决方案1
2 2022-05-14 15:40:49

解决方案2
2 2022-05-14 15:57:16

解决方案3
2 2022-05-14 16:24:57

如何根据r中的另一列填写缺失的一列

问题描述

3 个解决方案

解决方案1 2 2022-05-14 15:40:49

解决方案2 2 2022-05-14 15:57:16

解决方案3 2 2022-05-14 16:24:57

解决方案1
2 2022-05-14 15:40:49

解决方案2
2 2022-05-14 15:57:16

解决方案3
2 2022-05-14 16:24:57