简体   繁体   中英

Conditionally replace certain values in my dataframe with other values in the dataframe in R

I have seen this dplyr mutate/replace several columns on a subset of rows and this Conditional replacement of values in a data.frame

I want to conditionally replace certain values in my dataframe with other values in my dataframe, based on the nested structure of my data.

I have a dataframe which, in some cases, has multiple rows representing a single study. In these cases, I have data representing overall ratings of the study (where effect = "All.outcomes.") followed by rows representing ratings of specific effects within the study ( effect = "Study.1..Effect 1", for example). The rows representing specific effects nested within a study always have missing data on a number of variables. In these cases, I would like to replace the NA values in the specific effect rows with data from the "All.outcomes" row for each StudyID.

Missing data is always on given variables: q1 , q2B , q2C , q2A , and q4 . In the cases where there is missing data on these variables, I would like to fill in the data from the line where effects = "All.outcomes."

Here's what it looks like now:

     studyID         effect  q1  q11 q2B  q2C q2a   q4   q9
a    s100      All.outcomes low   NA low high low  low   NA
b    s100 Study.1..Effect.1  NA high  NA   NA  NA   NA  low
c    s100 Study.1..Effect.2  NA high  NA   NA  NA   NA  low
d    s101      All.outcomes low  low low high low  low  low
e    s102      All.outcomes low  low low high low high  low
f    s104      All.outcomes low   NA low high low  low   NA
g    s104 Study.1..Effect.1  NA  low  NA   NA  NA   NA  low
h    s104 Study.2..Effect.1  NA high  NA   NA  NA   NA high
i    s104 Study.3..Effect.1  NA  low  NA   NA  NA   NA  low

Here's how I want the dataframe to look:

     studyID         effect  q1  q11 q2B  q2C q2a   q4   q9
a    s100      All.outcomes low   NA low high low  low   NA
b    s100 Study.1..Effect.1 low high low high low  low  low
c    s100 Study.1..Effect.2 low high low high low  low  low
f    s104      All.outcomes low   NA low high low  low   NA
g    s104 Study.1..Effect.1 low  low low high low  low  low
h    s104 Study.2..Effect.1 low high low high low  low high
i    s104 Study.3..Effect.1 low  low low high low  low  low

You can see that, for example, for s100 effect = "Study.1..Effect.1" and "Study.1..Effect.2", I filled in q1 , q2B , q2C , q2A , and q4 with values from effect = `All.outcomes."

Note that here are studyIDs in the database where this pattern does not occur (ie, they only have effect = "All.outcomes") and that in cases where a given study does have multiple effects nested within it, the number and names of these nested effects vary (as can be seen comparing studyID = s100 and s104 in the sample data).

Is there an efficient solution to this? Thank you in advance for your help.

#sample data
a <- c("s100", "All.outcomes", "low", "NA", "low", "high", "low", "low", "NA")
b <- c("s100",  "Study.1..Effect.1",    "NA",   "high", "NA",   "NA",   "NA",   "NA",   "low")
c <- c("s100",  "Study.1..Effect.2",    "NA",   "high", "NA",   "NA",   "NA",   "NA",   "low")
d <- c("s101",  "All.outcomes", "low",  "low",  "low",  "high", "low",  "low",  "low")
e <- c("s102",  "All.outcomes", "low",  "low",  "low",  "high", "low",  "high", "low")
f <- c("s104",  "All.outcomes", "low",  "NA",   "low",  "high", "low",  "low",  "NA")
g <- c("s104",  "Study.1..Effect.1",    "NA",   "low",  "NA",   "NA",   "NA",   "NA",   "low")
h <- c("s104",  "Study.2..Effect.1",    "NA",   "high", "NA",   "NA",   "NA",   "NA",   "high")
i <- c("s104",  "Study.3..Effect.1",    "NA",   "low",  "NA",   "NA",   "NA",   "NA",   "low")

df <- as.data.frame(rbind(a,b,c,d,e,f,g,h,i))
colnames(df)=c("studyID", "effect", "q1", "q11", "q2B", "q2C", "q2a", "q4", "q9")

Using your sample data:

library(dplyr)
library(tidyr)

df %>% 
  replace(. == "NA", NA_character_) %>% 
  group_by(studyID) %>% 
  fill(c(q1,q11,q2B,q2C,q2a,q4,q9),.direction = "down")

This gives us:

# A tibble: 9 x 9
# Groups:   studyID [4]
  studyID effect            q1    q11   q2B   q2C   q2a   q4    q9   
  <chr>   <chr>             <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 s100    All.outcomes      low   NA    low   high  low   low   NA   
2 s100    Study.1..Effect.1 low   high  low   high  low   low   low  
3 s100    Study.1..Effect.2 low   high  low   high  low   low   low  
4 s101    All.outcomes      low   low   low   high  low   low   low  
5 s102    All.outcomes      low   low   low   high  low   high  low  
6 s104    All.outcomes      low   NA    low   high  low   low   NA   
7 s104    Study.1..Effect.1 low   low   low   high  low   low   low  
8 s104    Study.2..Effect.1 low   high  low   high  low   low   high 
9 s104    Study.3..Effect.1 low   low   low   high  low   low   low  

For each studyID you can replace the NA values from columns q1 to q9 with corresponding values where effect == 'All.outcomes' .

library(dplyr)
df %>%
  group_by(studyID) %>%
  mutate(across(q1:q9, ~replace(., . == 'NA', .[effect == 'All.outcomes'])))

#  studyID effect            q1    q11   q2B   q2C   q2a   q4    q9   
#  <chr>   <chr>             <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 s100    All.outcomes      low   NA    low   high  low   low   NA   
#2 s100    Study.1..Effect.1 low   high  low   high  low   low   low  
#3 s100    Study.1..Effect.2 low   high  low   high  low   low   low  
#4 s101    All.outcomes      low   low   low   high  low   low   low  
#5 s102    All.outcomes      low   low   low   high  low   high  low  
#6 s104    All.outcomes      low   NA    low   high  low   low   NA   
#7 s104    Study.1..Effect.1 low   low   low   high  low   low   low  
#8 s104    Study.2..Effect.1 low   high  low   high  low   low   high 
#9 s104    Study.3..Effect.1 low   low   low   high  low   low   low  

For the data shared you have string "NA" , for your actual data if you have real NA use is.na(.) in replace instead of . == 'NA' . == 'NA' .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM