简体   繁体   English

如何根据r中的另一列填写缺失的一列

[英]how to fill missing in one column based on another in r

I have a subset of data frame as below.我有一个数据框的子集,如下所示。 I want to fill the NAs in column "age at disease" so that the age of one individual with disease be same as the sibling (identified from familyID) without disease.我想在“患病年龄”列中填写 NA,以便一个患有疾病的人的年龄与没有疾病的兄弟姐妹(从 familyID 识别)相同。 在此处输入图像描述

structure(list(id = c(1, 2, 3, 4, 5, 6), 
           familyId = c(1, 1, 2, 2, 3, 3), 
           disease = c(1, 0, 0, 1, 1, 0), 
           `age at disease` = c("40","NA", "NA", "43", "52", "NA")), 
      class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -6L))

which means that the last column "age at disease" should be: c(40,40,43,43,52,52).这意味着最后一列“患病年龄”应该是:c(40,40,43,43,52,52)。

You can use the following code:您可以使用以下代码:

library(dplyr)
library(tidyr)
df %>%
  na_if("NA") %>%
  group_by(familyId) %>%
  fill(`age at disease`) %>%
  fill(`age at disease`, .direction = "up")

Output:输出:

# A tibble: 6 × 4
# Groups:   familyId [3]
     id familyId disease `age at disease`
  <dbl>    <dbl>   <dbl> <chr>           
1     1        1       1 40              
2     2        1       0 40              
3     3        2       0 43              
4     4        2       1 43              
5     5        3       1 52              
6     6        3       0 52  

If there is only a single non-NA element per group, we may also do如果每组只有一个非 NA 元素,我们也可以这样做

library(dplyr)
df1 %>%
   type.convert(as.is = TRUE) %>%
   group_by(familyId) %>%
   mutate(`age at disease` = `age at disease`[complete.cases(`age at disease`)][1]) %>% 
   ungroup

-output -输出

# A tibble: 6 × 4
     id familyId disease `age at disease`
  <dbl>    <dbl>   <dbl> <chr>           
1     1        1       1 40              
2     2        1       0 40              
3     3        2       0 43              
4     4        2       1 43              
5     5        3       1 52              
6     6        3       0 52       

Here is another dplyr approach:这是另一种dplyr方法:

df %>%
  group_by(familyId) %>% 
  arrange(`age at disease`,.by_group = TRUE) %>% 
  mutate(`age at disease` = first(`age at disease`))
     id familyId disease `age at disease`
  <dbl>    <dbl>   <dbl> <chr>           
1     1        1       1 40              
2     2        1       0 40              
3     4        2       1 43              
4     3        2       0 43              
5     5        3       1 52              
6     6        3       0 52 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM