I'm doing research of the communication lines to a patient when they get sick. So for example: A person gets sick and goes to the doctor (A), then gets to the hospital (B), gets into contact with insurance (C) etc. The order is different for each patient. For instance, one patient will directly go to the hospital while the other person will first check the insurance etc. We've followed patients through the whole process and after the came into contact with a different authority, we let them fill out another survey. So after each authority ("step") we got the score for a survey. This gives me the following dataset set-up (in reality it is a very large dataset):
Patient<-c(1,1,1,1,1,1,1,2,2,2,2)
sample6<-c("A","A","A","A","A","A","A","A","A","A","A")
sample5<-c("Stop","B","B","B","B","B","B","Stop","C","C","C")
sample4<-c(NA,"Stop","C","C","C","C","C",NA, "Stop","F","F")
sample3<-c(NA,NA,"Stop","D","D","D","D",NA, NA,"Stop","G")
sample2<-c(NA,NA,NA,"Stop","E","E","E",NA, NA,NA,"Stop")
sample1<-c(NA,NA,NA,NA, "Stop","F","F",NA,NA,NA, NA)
sample0<-c(NA,NA,NA,NA, NA,"Stop","G",NA,NA,NA, NA)
sample00<-c(NA,NA,NA,NA, NA,NA,"Stop",NA,NA,NA, NA)
Score<-c(90,88,65,44,78,98,66,38,93,88,80)
Time<-c("01-01-2018", "02-01-2018", "03-01-2018", "04-01-2018", "05-01-2018", "06-01-2018", "07-01-2018","01-02-2018", "02-02-2018", "05-02-2018", "06-02-2018")
df<-data.frame("Patient"=Patient, "step0"=sample6, "step1"=sample5, "step2"=sample4, "step3"=sample3, "step4"=sample2,
"step5"=sample1,"step6"= sample0, "step7"=sample00, "Score"=Score, "Time"=Time)
> df
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time
1 1 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 90 01-01-2018
2 1 A B Stop <NA> <NA> <NA> <NA> <NA> 88 02-01-2018
3 1 A B C Stop <NA> <NA> <NA> <NA> 65 03-01-2018
4 1 A B C D Stop <NA> <NA> <NA> 44 04-01-2018
5 1 A B C D E Stop <NA> <NA> 78 05-01-2018
6 1 A B C D E F Stop <NA> 98 06-01-2018
7 1 A B C D E F G Stop 66 07-01-2018
8 2 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 38 01-02-2018
9 2 A C Stop <NA> <NA> <NA> <NA> <NA> 93 02-02-2018
10 2 A C F Stop <NA> <NA> <NA> <NA> 88 05-02-2018
11 2 A C F G Stop <NA> <NA> <NA> 80 06-02-2018
So for example: row 1 has the survey score after authority A, row 2 is for the same patient and has the score of the survey after authority B etc. Now I want to compare columns that have the same final proces, I will take "F" as an example but it could also be "C" for another analysis. So now I want to select all rows that indicate "F" as the final authority AND the row before so that I can compare them.
So I want to create this dataset:
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time Indicator
1 1 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 90 01-01-2018 0
2 1 A B Stop <NA> <NA> <NA> <NA> <NA> 88 02-01-2018 0
3 1 A B C Stop <NA> <NA> <NA> <NA> 65 03-01-2018 0
4 1 A B C D Stop <NA> <NA> <NA> 44 04-01-2018 0
5 1 A B C D E Stop <NA> <NA> 78 05-01-2018 Before
6 1 A B C D E F Stop <NA> 98 06-01-2018 After
7 1 A B C D E F G Stop 66 07-01-2018 0
8 2 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 38 01-02-2018 0
9 2 A C Stop <NA> <NA> <NA> <NA> <NA> 93 02-02-2018 Before
10 2 A C F Stop <NA> <NA> <NA> <NA> 88 05-02-2018 After
11 2 A C F G Stop <NA> <NA> <NA> 80 06-02-2018 0
I did manage to indicate the rows that contain "F" plus the previous:
ProcessColumns <- 2:9
d <- df[,ProcessColumns] == "F"
df$Indicator <- rowSums(d,na.rm=T)
df$filter[which(df$filter %in% 1)-1] <- "Before"
df$filter[which(df$filter %in% 1)] <- "After"
But now it indicates ALL the rows containing "F" not just in the end.. anyone who can help me?
We can do something like
df %>% mutate(sum=rowSums(!is.na(.[2:9]))) %>%
group_by(Patient) %>% mutate(max = sum-max(sum), Indicator = case_when(max == -2 ~ "Before", max == -1 ~ "After", TRUE ~ as.character(0)))
# A tibble: 11 x 14
# Groups: Patient [2]
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time sum max Ind
<dbl> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <dbl> <fct> <dbl> <dbl> <chr>
1 1.00 A Stop NA NA NA NA NA NA 90.0 01-01-2018 2.00 -6.00 0
2 1.00 A B Stop NA NA NA NA NA 88.0 02-01-2018 3.00 -5.00 0
3 1.00 A B C Stop NA NA NA NA 65.0 03-01-2018 4.00 -4.00 0
4 1.00 A B C D Stop NA NA NA 44.0 04-01-2018 5.00 -3.00 0
5 1.00 A B C D E Stop NA NA 78.0 05-01-2018 6.00 -2.00 Before
6 1.00 A B C D E F Stop NA 98.0 06-01-2018 7.00 -1.00 After
7 1.00 A B C D E F G Stop 66.0 07-01-2018 8.00 0 0
8 2.00 A Stop NA NA NA NA NA NA 38.0 01-02-2018 2.00 -3.00 0
9 2.00 A C Stop NA NA NA NA NA 93.0 02-02-2018 3.00 -2.00 Before
10 2.00 A C F Stop NA NA NA NA 88.0 05-02-2018 4.00 -1.00 After
11 2.00 A C F G Stop NA NA NA 80.0 06-02-2018 5.00 0 0
Update: Inspired by @Andre Elrico answer
df %>% unite(All, matches("step"), sep="", remove=F ) %>%
mutate(Ind = str_detect(All,"BStop"), Indicator = case_when( lead(Ind) == TRUE ~ "Before", Ind == TRUE ~ "After", TRUE ~ as.character(0))) %>%
select(-All,-Ind)
Or you can:
library(dplyr)
After_IND <- df %>% apply(.,1,paste,collapse="") %>% grepl("FStop",.)
Before_IND<- lead(After_IND,1,F)
df$Indicator <- 0
df$Indicator[After_IND]<-"After"
df$Indicator[Before_IND]<-"Before"
# Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time Indicator
# 1 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 90 01-01-2018 0
# 1 A B Stop <NA> <NA> <NA> <NA> <NA> 88 02-01-2018 0
# 1 A B C Stop <NA> <NA> <NA> <NA> 65 03-01-2018 0
# 1 A B C D Stop <NA> <NA> <NA> 44 04-01-2018 0
# 1 A B C D E Stop <NA> <NA> 78 05-01-2018 Before
# 1 A B C D E F Stop <NA> 98 06-01-2018 After
# 1 A B C D E F G Stop 66 07-01-2018 0
# 2 A Stop <NA> <NA> <NA> <NA> <NA> <NA> 38 01-02-2018 0
# 2 A C Stop <NA> <NA> <NA> <NA> <NA> 93 02-02-2018 Before
# 2 A C F Stop <NA> <NA> <NA> <NA> 88 05-02-2018 After
# 2 A C F G Stop <NA> <NA> <NA> 80 06-02-2018 0
Please note:
If you want to compare B for eg. you have to change:
... %>% grepl("BStop",.)
A tidyverse
with lot of lines, but generally works.
library(tidyverse)
df %>%
rownames_to_column() %>%
gather(k,v,-Patient,-rowname,-Score, -Time) %>%
group_by(rowname) %>%
mutate(Indicator=ifelse(any(v %in%"F" ),"After",NA)) %>%
spread(k,v) %>%
arrange(as.numeric(rowname)) %>%
group_by(Patient) %>%
mutate(Indicator=ifelse(duplicated(Indicator), NA, Indicator)) %>%
mutate(Indicator2=ifelse(lead(Indicator) == "After", "Before", NA)) %>%
mutate(Indicator=ifelse(!is.na(Indicator2), Indicator2, Indicator)) %>%
select(Patient, starts_with("step"), Score, Time,Indicator, -Indicator2,-rowname) %>%
ungroup()
# A tibble: 11 x 12
Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time Indicator
<dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <fct> <chr>
1 1 A Stop NA NA NA NA NA NA 90 01-01-2018 NA
2 1 A B Stop NA NA NA NA NA 88 02-01-2018 NA
3 1 A B C Stop NA NA NA NA 65 03-01-2018 NA
4 1 A B C D Stop NA NA NA 44 04-01-2018 NA
5 1 A B C D E Stop NA NA 78 05-01-2018 Before
6 1 A B C D E F Stop NA 98 06-01-2018 After
7 1 A B C D E F G Stop 66 07-01-2018 NA
8 2 A Stop NA NA NA NA NA NA 38 01-02-2018 NA
9 2 A C Stop NA NA NA NA NA 93 02-02-2018 Before
10 2 A C F Stop NA NA NA NA 88 05-02-2018 After
11 2 A C F G Stop NA NA NA 80 06-02-2018 NA
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.