简体   繁体   English

用 R 中的唯一字符值填充 NA

[英]Fill NA with unique character value in R

I'm trying to get unique ID for NAs' in my data.我正在尝试在我的数据中获取 NA 的唯一 ID。 Example will clears my idea.示例将清除我的想法。

library(dplyr)
tbl <-  tibble(ID = c(rep("A", 3), rep("B", 3)),
               SecondID = c("AAA", NA, NA, "BBB", "BBB", NA),
               ThirdID = c("CCC", NA, NA, "DDD", "DDD", NA))

I need unique values for Second and Third ID.我需要第二个和第三个 ID 的唯一值。 Here is how solved it, but now I'm just waiting crisis to happen when sample will give me exactly same number twice for certain group.这是解决它的方法,但现在我只是在等待危机发生时,样本将为某些组提供两次完全相同的数字。

tbl %>% 
  rowwise() %>% 
  mutate(SecondID = if_else(is.na(SecondID), paste0(ID, ".", sample(1:100, 1, replace=FALSE)), SecondID)) %>% 
  ungroup() %>% 
  mutate(ThirdID = if_else(is.na(ThirdID), paste0(SecondID, ".1"), ThirdID))

  ID    SecondID ThirdID
  <chr> <chr>    <chr>  
1 A     AAA      CCC    
2 A     A.54     A.54.1 
3 A     A.65     A.65.1 
4 B     BBB      DDD    
5 B     BBB      DDD    
6 B     B.8      B.8.1  

Is there a foolproof method creating IDs'?是否有创建 ID 的万无一失的方法?

Get all the values from sample together not in rowwise fashion so that it is guaranteed that you get different numbers.sample所有值放在一起而不是按rowwise方式获取,以确保您获得不同的数字。

library(dplyr)


tbl %>%
  mutate(SecondID = replace(SecondID, is.na(SecondID), 
   paste(ID[is.na(SecondID)], sample(1:100, sum(is.na(SecondID))), sep = ".")), 
         ThirdID = ifelse(is.na(ThirdID), paste0(SecondID, '.1'), ThirdID))

# ID    SecondID ThirdID
#  <chr> <chr>    <chr>  
#1 A     AAA      CCC    
#2 A     A.29     A.29.1 
#3 A     A.13     A.13.1 
#4 B     BBB      DDD    
#5 B     BBB      DDD    
#6 B     B.22     B.22.1 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM