简体   繁体   English

如果某个列是 NA,我如何获取匹配字符串的第一个非 NA 列

[英]How do I grab the first non-NA column matching a string if a certain column is NA

I haven't seen a similar question for this issue (it's pretty specific).我还没有看到这个问题的类似问题(它非常具体)。

I have three people who can be chosen to give answers to questions about a patient.我可以选择三个人来回答有关患者的问题。 Only two of those three people are actually used for any particular patient (in my real data it's always two who are chosen, but among a pool of 10 people).这三个人中只有两个人实际用于任何特定患者(在我的真实数据中,总是选择两个人,但在 10 人中)。

If the initial two people disagree, a third person is used (3rdOpinion) and that opinion overrides the others.如果最初的两个人不同意,则使用第三人(3rdOpinion)并且该意见优先于其他人。

Therefore, the final result = the 3rd opinion result, UNLESS the initial two opinions are the same (ie 3rdOpinion is NA), in which case the final result is just the opinion given by the initial two people (ie the first non-NA value for that question for that patient)因此,最终结果=第3个意见结果,除非最初的两个意见相同(即第3个意见为NA),在这种情况下,最终结果只是最初两个人给出的意见(即第一个非NA值)对于那个病人的那个问题)

So for example patient 1 question 1, Ben and Chris disagreed, so 3rdOpinion was used as the final result.因此例如患者 1 的问题 1,Ben 和 Chris 不同意,因此使用 3rdOpinion 作为最终结果。

For question 2, patient 2, Adam and Chris both said "yes", so the final result is "yes" and the 3rd opinion wasn't used.对于问题 2,患者 2,Adam 和 Chris 都说“是”,因此最终结果是“是”,并且没有使用第三意见。

How can I code my data to give the last two columns, Question1_final and Question2_final?我如何编码我的数据以给出最后两列,Question1_final 和 Question2_final?

#Code to reproduce the data with the desired last two columns:
Patient <- c("1","2","3")
Question1_Adam <- c(NA,"Yes","No")
Question2_Adam <- c(NA,"Yes","No")
Question1_Ben <- c("Yes",NA,"Unlikely")
Question2_Ben <- c("No",NA,"No")
Question1_Chris <- c("Probably","Probably",NA)
Question2_Chris <- c("Unlikely","Yes",NA)
Question1_3rdOpinion <- c("Probably","Yes","No")
Question2_3rdOpinion <- c("No",NA,NA)
Question1_final <- c("Probably","Yes","No")
Question2_final <- c("No","Yes","Unlikely")
df <- data.frame(Patient, Question1_Adam, Question2_Adam, Question1_Ben, Question2_Ben, Question1_Chris, Question2_Chris, Question1_3rdOpinion, Question2_3rdOpinion, Question1_final, Question2_final)

I figured I needed something like this, but not sure how to code the last part:我想我需要这样的东西,但不知道如何编码最后一部分:

df <- transform(df, Q1_final = ifelse(!is.na(Question1_3rdOpinion), Question1_3rdOpinion, *here I would somehow grep the first non-NA question 1 value*))

One way using dplyr and tidyr is to get data in long format, separate question and people into different columns.使用dplyrtidyr一种方法是获取长格式的数据, separate问题和人员分成不同的列。 For each Patient and Question check if the first two values are the same take that value or take the 3rd value.对于每个PatientQuestion检查前两个值是否相同,取该值或取第三个值。

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = -Patient, values_drop_na = TRUE) %>%
  separate(name, into = c("Question", "name"), sep = "_") %>%
  group_by(Patient, Question) %>%
  summarise(ans = if (value[1L] == value[2L]) value[1L] else value[3L]) %>%
  mutate(Question = paste0(Question, "_final")) %>%
  pivot_wider(names_from = Question, values_from = ans) %>%
  left_join(df, by = "Patient")

# Patient Question1_final Question2_final Question1_Adam Question2_Adam
#  <fct>   <fct>           <fct>           <fct>          <fct>         
#1 1       Probably        No              NA             NA            
#2 2       Yes             Yes             Yes            Yes           
#3 3       No              No              No             No            
# … with 6 more variables: Question1_Ben <fct>, Question2_Ben <fct>,
#   Question1_Chris <fct>, Question2_Chris <fct>, Question1_3rdOpinion <fct>,
#   Question2_3rdOpinion <fct>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM