I have seen this dplyr mutate/replace several columns on a subset of rows and this Conditional replacement of values in a data.frame
I want to conditionally replace certain values in my dataframe with other values in my dataframe, based on the nested structure of my data.
I have a dataframe which, in some cases, has multiple rows representing a single study. In these cases, I have data representing overall ratings of the study (where effect
= "All.outcomes.") followed by rows representing ratings of specific effects within the study ( effect
= "Study.1..Effect 1", for example). The rows representing specific effects nested within a study always have missing data on a number of variables. In these cases, I would like to replace the NA values in the specific effect rows with data from the "All.outcomes" row for each StudyID.
Missing data is always on given variables: q1
, q2B
, q2C
, q2A
, and q4
. In the cases where there is missing data on these variables, I would like to fill in the data from the line where effects
= "All.outcomes."
Here's what it looks like now:
studyID effect q1 q11 q2B q2C q2a q4 q9
a s100 All.outcomes low NA low high low low NA
b s100 Study.1..Effect.1 NA high NA NA NA NA low
c s100 Study.1..Effect.2 NA high NA NA NA NA low
d s101 All.outcomes low low low high low low low
e s102 All.outcomes low low low high low high low
f s104 All.outcomes low NA low high low low NA
g s104 Study.1..Effect.1 NA low NA NA NA NA low
h s104 Study.2..Effect.1 NA high NA NA NA NA high
i s104 Study.3..Effect.1 NA low NA NA NA NA low
Here's how I want the dataframe to look:
studyID effect q1 q11 q2B q2C q2a q4 q9
a s100 All.outcomes low NA low high low low NA
b s100 Study.1..Effect.1 low high low high low low low
c s100 Study.1..Effect.2 low high low high low low low
f s104 All.outcomes low NA low high low low NA
g s104 Study.1..Effect.1 low low low high low low low
h s104 Study.2..Effect.1 low high low high low low high
i s104 Study.3..Effect.1 low low low high low low low
You can see that, for example, for s100 effect
= "Study.1..Effect.1" and "Study.1..Effect.2", I filled in q1
, q2B
, q2C
, q2A
, and q4
with values from effect
= `All.outcomes."
Note that here are studyIDs in the database where this pattern does not occur (ie, they only have effect
= "All.outcomes") and that in cases where a given study does have multiple effects nested within it, the number and names of these nested effects vary (as can be seen comparing studyID
= s100 and s104 in the sample data).
Is there an efficient solution to this? Thank you in advance for your help.
#sample data
a <- c("s100", "All.outcomes", "low", "NA", "low", "high", "low", "low", "NA")
b <- c("s100", "Study.1..Effect.1", "NA", "high", "NA", "NA", "NA", "NA", "low")
c <- c("s100", "Study.1..Effect.2", "NA", "high", "NA", "NA", "NA", "NA", "low")
d <- c("s101", "All.outcomes", "low", "low", "low", "high", "low", "low", "low")
e <- c("s102", "All.outcomes", "low", "low", "low", "high", "low", "high", "low")
f <- c("s104", "All.outcomes", "low", "NA", "low", "high", "low", "low", "NA")
g <- c("s104", "Study.1..Effect.1", "NA", "low", "NA", "NA", "NA", "NA", "low")
h <- c("s104", "Study.2..Effect.1", "NA", "high", "NA", "NA", "NA", "NA", "high")
i <- c("s104", "Study.3..Effect.1", "NA", "low", "NA", "NA", "NA", "NA", "low")
df <- as.data.frame(rbind(a,b,c,d,e,f,g,h,i))
colnames(df)=c("studyID", "effect", "q1", "q11", "q2B", "q2C", "q2a", "q4", "q9")
Using your sample data:
library(dplyr)
library(tidyr)
df %>%
replace(. == "NA", NA_character_) %>%
group_by(studyID) %>%
fill(c(q1,q11,q2B,q2C,q2a,q4,q9),.direction = "down")
This gives us:
# A tibble: 9 x 9
# Groups: studyID [4]
studyID effect q1 q11 q2B q2C q2a q4 q9
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 s100 All.outcomes low NA low high low low NA
2 s100 Study.1..Effect.1 low high low high low low low
3 s100 Study.1..Effect.2 low high low high low low low
4 s101 All.outcomes low low low high low low low
5 s102 All.outcomes low low low high low high low
6 s104 All.outcomes low NA low high low low NA
7 s104 Study.1..Effect.1 low low low high low low low
8 s104 Study.2..Effect.1 low high low high low low high
9 s104 Study.3..Effect.1 low low low high low low low
For each studyID
you can replace
the NA
values from columns q1
to q9
with corresponding values where effect == 'All.outcomes'
.
library(dplyr)
df %>%
group_by(studyID) %>%
mutate(across(q1:q9, ~replace(., . == 'NA', .[effect == 'All.outcomes'])))
# studyID effect q1 q11 q2B q2C q2a q4 q9
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 s100 All.outcomes low NA low high low low NA
#2 s100 Study.1..Effect.1 low high low high low low low
#3 s100 Study.1..Effect.2 low high low high low low low
#4 s101 All.outcomes low low low high low low low
#5 s102 All.outcomes low low low high low high low
#6 s104 All.outcomes low NA low high low low NA
#7 s104 Study.1..Effect.1 low low low high low low low
#8 s104 Study.2..Effect.1 low high low high low low high
#9 s104 Study.3..Effect.1 low low low high low low low
For the data shared you have string "NA"
, for your actual data if you have real NA
use is.na(.)
in replace
instead of . == 'NA'
. == 'NA'
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.