簡體   English   中英

跨多列過濾或ifelse

[英]Filter or ifelse across multiple columns

我正在研究患者生病時的通訊線路。 因此,例如:一個人生病然后去看醫生(A),然后去醫院(B),開始接觸保險(C)等。每個病人的順序是不同的。 例如,一位患者將直接去醫院,而另一位患者將首先檢查保險等。我們在整個過程中一直跟蹤患者,在與其他機構聯系后,我們讓他們填寫另一份調查表。 因此,在每個權限(“步驟”)之后,我們得到了調查的分數。 這為我提供了以下數據集設置(實際上這是一個非常大的數據集):

Patient<-c(1,1,1,1,1,1,1,2,2,2,2)
sample6<-c("A","A","A","A","A","A","A","A","A","A","A")
sample5<-c("Stop","B","B","B","B","B","B","Stop","C","C","C")
sample4<-c(NA,"Stop","C","C","C","C","C",NA, "Stop","F","F")
sample3<-c(NA,NA,"Stop","D","D","D","D",NA, NA,"Stop","G")
sample2<-c(NA,NA,NA,"Stop","E","E","E",NA, NA,NA,"Stop")
sample1<-c(NA,NA,NA,NA, "Stop","F","F",NA,NA,NA, NA)
sample0<-c(NA,NA,NA,NA, NA,"Stop","G",NA,NA,NA, NA)
sample00<-c(NA,NA,NA,NA, NA,NA,"Stop",NA,NA,NA, NA)
Score<-c(90,88,65,44,78,98,66,38,93,88,80)
Time<-c("01-01-2018", "02-01-2018", "03-01-2018", "04-01-2018", "05-01-2018", "06-01-2018", "07-01-2018","01-02-2018", "02-02-2018", "05-02-2018", "06-02-2018")

df<-data.frame("Patient"=Patient, "step0"=sample6, "step1"=sample5, "step2"=sample4, "step3"=sample3, "step4"=sample2, 
               "step5"=sample1,"step6"= sample0, "step7"=sample00, "Score"=Score, "Time"=Time)

> df
   Patient step0 step1 step2 step3 step4 step5 step6 step7 Score       Time
1        1     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    90 01-01-2018
2        1     A     B  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    88 02-01-2018
3        1     A     B     C  Stop  <NA>  <NA>  <NA>  <NA>    65 03-01-2018
4        1     A     B     C     D  Stop  <NA>  <NA>  <NA>    44 04-01-2018
5        1     A     B     C     D     E  Stop  <NA>  <NA>    78 05-01-2018
6        1     A     B     C     D     E     F  Stop  <NA>    98 06-01-2018
7        1     A     B     C     D     E     F     G  Stop    66 07-01-2018
8        2     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    38 01-02-2018
9        2     A     C  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    93 02-02-2018
10       2     A     C     F  Stop  <NA>  <NA>  <NA>  <NA>    88 05-02-2018
11       2     A     C     F     G  Stop  <NA>  <NA>  <NA>    80 06-02-2018

因此,例如:第1行在權限A之后具有調查得分,第2行是針對同一患者,在權限B之后具有調查得分,依此類推。現在,我想比較具有相同最終過程的列,我將采用“以“ F”為例,但對於其他分析,也可以為“ C”。 因此,現在我想選擇所有指示“ F”作為最終權限的行以及之前的行,以便可以對其進行比較。

所以我想創建這個數據集:

   Patient step0 step1 step2 step3 step4 step5 step6 step7 Score       Time Indicator
1        1     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    90 01-01-2018         0
2        1     A     B  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    88 02-01-2018         0
3        1     A     B     C  Stop  <NA>  <NA>  <NA>  <NA>    65 03-01-2018         0
4        1     A     B     C     D  Stop  <NA>  <NA>  <NA>    44 04-01-2018         0
5        1     A     B     C     D     E  Stop  <NA>  <NA>    78 05-01-2018         Before
6        1     A     B     C     D     E     F  Stop  <NA>    98 06-01-2018         After
7        1     A     B     C     D     E     F     G  Stop    66 07-01-2018         0
8        2     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    38 01-02-2018         0
9        2     A     C  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    93 02-02-2018         Before
10       2     A     C     F  Stop  <NA>  <NA>  <NA>  <NA>    88 05-02-2018         After
11       2     A     C     F     G  Stop  <NA>  <NA>  <NA>    80 06-02-2018         0

我確實設法指出了包含“ F”加上前一行的行:

ProcessColumns <- 2:9
d <- df[,ProcessColumns] == "F"
df$Indicator <- rowSums(d,na.rm=T)
df$filter[which(df$filter %in% 1)-1] <- "Before"
df$filter[which(df$filter %in% 1)] <- "After"

但是現在它指示出所有包含“ F”的行,而不僅僅是結尾。.有誰能幫助我?

我們可以做類似的事情

df %>% mutate(sum=rowSums(!is.na(.[2:9]))) %>% 
group_by(Patient) %>% mutate(max = sum-max(sum), Indicator  = case_when(max == -2 ~ "Before", max == -1 ~ "After", TRUE ~ as.character(0)))

# A tibble: 11 x 14
# Groups:   Patient [2]
     Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time         sum   max Ind   
     <dbl> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <fct> <dbl> <fct>      <dbl> <dbl> <chr> 
 1    1.00 A     Stop  NA    NA    NA    NA    NA    NA     90.0 01-01-2018  2.00 -6.00 0     
 2    1.00 A     B     Stop  NA    NA    NA    NA    NA     88.0 02-01-2018  3.00 -5.00 0     
 3    1.00 A     B     C     Stop  NA    NA    NA    NA     65.0 03-01-2018  4.00 -4.00 0     
 4    1.00 A     B     C     D     Stop  NA    NA    NA     44.0 04-01-2018  5.00 -3.00 0     
 5    1.00 A     B     C     D     E     Stop  NA    NA     78.0 05-01-2018  6.00 -2.00 Before
 6    1.00 A     B     C     D     E     F     Stop  NA     98.0 06-01-2018  7.00 -1.00 After 
 7    1.00 A     B     C     D     E     F     G     Stop   66.0 07-01-2018  8.00  0    0     
 8    2.00 A     Stop  NA    NA    NA    NA    NA    NA     38.0 01-02-2018  2.00 -3.00 0     
 9    2.00 A     C     Stop  NA    NA    NA    NA    NA     93.0 02-02-2018  3.00 -2.00 Before
10    2.00 A     C     F     Stop  NA    NA    NA    NA     88.0 05-02-2018  4.00 -1.00 After 
11    2.00 A     C     F     G     Stop  NA    NA    NA     80.0 06-02-2018  5.00  0    0 

更新:靈感來自@Andre Elrico答案

df %>% unite(All, matches("step"), sep="", remove=F ) %>% 
       mutate(Ind = str_detect(All,"BStop"), Indicator = case_when( lead(Ind) == TRUE ~ "Before", Ind == TRUE ~ "After", TRUE ~ as.character(0))) %>% 
       select(-All,-Ind)

或者您可以:

library(dplyr)

After_IND <- df %>% apply(.,1,paste,collapse="") %>% grepl("FStop",.)
Before_IND<- lead(After_IND,1,F)

df$Indicator <- 0
df$Indicator[After_IND]<-"After"
df$Indicator[Before_IND]<-"Before"

#  Patient step0 step1 step2 step3 step4 step5 step6 step7 Score       Time Indicator
#        1     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    90 01-01-2018         0
#        1     A     B  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    88 02-01-2018         0
#        1     A     B     C  Stop  <NA>  <NA>  <NA>  <NA>    65 03-01-2018         0
#        1     A     B     C     D  Stop  <NA>  <NA>  <NA>    44 04-01-2018         0
#        1     A     B     C     D     E  Stop  <NA>  <NA>    78 05-01-2018    Before
#        1     A     B     C     D     E     F  Stop  <NA>    98 06-01-2018     After
#        1     A     B     C     D     E     F     G  Stop    66 07-01-2018         0
#        2     A  Stop  <NA>  <NA>  <NA>  <NA>  <NA>  <NA>    38 01-02-2018         0
#        2     A     C  Stop  <NA>  <NA>  <NA>  <NA>  <NA>    93 02-02-2018    Before
#        2     A     C     F  Stop  <NA>  <NA>  <NA>  <NA>    88 05-02-2018     After
#        2     A     C     F     G  Stop  <NA>  <NA>  <NA>    80 06-02-2018         0

請注意:

如果要比較B例如。 您必須更改:

... %>% grepl("BStop",.)

帶有許多行的tidyverse ,但通常有效。

library(tidyverse)
df %>%
  rownames_to_column() %>% 
  gather(k,v,-Patient,-rowname,-Score, -Time) %>% 
  group_by(rowname) %>% 
  mutate(Indicator=ifelse(any(v %in%"F" ),"After",NA)) %>% 
  spread(k,v)  %>% 
  arrange(as.numeric(rowname)) %>% 
  group_by(Patient) %>% 
  mutate(Indicator=ifelse(duplicated(Indicator), NA, Indicator)) %>% 
  mutate(Indicator2=ifelse(lead(Indicator) == "After", "Before", NA)) %>% 
  mutate(Indicator=ifelse(!is.na(Indicator2), Indicator2, Indicator)) %>% 
  select(Patient, starts_with("step"), Score, Time,Indicator, -Indicator2,-rowname) %>% 
  ungroup()
# A tibble: 11 x 12
   Patient step0 step1 step2 step3 step4 step5 step6 step7 Score Time       Indicator
     <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <fct>      <chr>    
 1       1 A     Stop  NA    NA    NA    NA    NA    NA       90 01-01-2018 NA       
 2       1 A     B     Stop  NA    NA    NA    NA    NA       88 02-01-2018 NA       
 3       1 A     B     C     Stop  NA    NA    NA    NA       65 03-01-2018 NA       
 4       1 A     B     C     D     Stop  NA    NA    NA       44 04-01-2018 NA       
 5       1 A     B     C     D     E     Stop  NA    NA       78 05-01-2018 Before   
 6       1 A     B     C     D     E     F     Stop  NA       98 06-01-2018 After    
 7       1 A     B     C     D     E     F     G     Stop     66 07-01-2018 NA       
 8       2 A     Stop  NA    NA    NA    NA    NA    NA       38 01-02-2018 NA       
 9       2 A     C     Stop  NA    NA    NA    NA    NA       93 02-02-2018 Before   
10       2 A     C     F     Stop  NA    NA    NA    NA       88 05-02-2018 After    
11       2 A     C     F     G     Stop  NA    NA    NA       80 06-02-2018 NA  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM