简体   繁体   中英

I am searching an optimal way to change variable categories in dummy variables

I have some patients who receive different treatments at different times. I want to change the treatment they have received into a binary variable that takes the value of 1 if the patient has received the drug at least once and the value of 0 if they have never received it.

I managed to do this but in a tedious way, which could be difficult with dozens of different types of drugs.

I would like to optimize my code which mainly avoids creating all the binary variables related to the drug one by one.

id<-c(rep(1,7),rep(2,4))
medoc<-c("par","mor","mor","par","sed","sed",
         "sed","cur","sed","cur","sed")

mydata<-data.frame(id,medoc)

mydata2<-mydata%>%group_by(id)%>%
  mutate(medoc_str=paste(unique(medoc),collapse  = " "))%>%
  distinct(id,.keep_all = TRUE)

mydata2$par<-NA
mydata2$mor<-NA
mydata2$sed<-NA
mydata2$cur<-NA

mydata2$par<-ifelse(
  grepl("par",mydata2$medoc_str)==TRUE,1,0
)

mydata2$mor<-ifelse(
  grepl("mor",mydata2$medoc_str)==TRUE,1,0
)

mydata2$sed<-ifelse(
  grepl("sed",mydata2$medoc_str)==TRUE,1,0
)

mydata2$cur<-ifelse(
  grepl("cur",mydata2$medoc_str)==TRUE,1,0
)

If I understand it, you want to dummify your variables. We can do it with tidyr::pivot_wider too, but I really like to use specific libraries to do it very easily. I like the fastDummies package:

library(fastDummies)

dummy_cols(mydata, select_columns = 'medoc')

   id medoc medoc_cur medoc_mor medoc_par medoc_sed
1   1   par         0         0         1         0
2   1   mor         0         1         0         0
3   1   mor         0         1         0         0
4   1   par         0         0         1         0
5   1   sed         0         0         0         1
6   1   sed         0         0         0         1
7   1   sed         0         0         0         1
8   2   cur         1         0         0         0
9   2   sed         0         0         0         1
10  2   cur         1         0         0         0
11  2   sed         0         0         0         1

And here is an answer with pivot_wider :

library(tidyr)
library(dplyr)
mydata %>% mutate(index = row_number()) %>%
  pivot_wider(names_from = medoc,
              values_from = medoc,
              values_fn = \(x) +!is.na(x),
              values_fill = 0)

A solution similar to @Guedes's but with different values_fn :

library(dplyr)
library(tidyr)

mydata %>%
  mutate(row = row_number()) %>%
  pivot_wider(names_from = medoc, values_from = medoc,
              values_fn = function(x) 1, values_fill = 0) %>%
  select(-row)
# A tibble: 11 x 5
      id   par   mor   sed   cur
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1     0     0     0
 2     1     0     1     0     0
 3     1     0     1     0     0
 4     1     1     0     0     0
 5     1     0     0     1     0
 6     1     0     0     1     0
 7     1     0     0     1     0
 8     2     0     0     0     1
 9     2     0     0     1     0
10     2     0     0     0     1
11     2     0     0     1     0

Assuming that you want one row for each id with binary columns indicating which values of medoc are present (1) or absent (0) we can use table like this. (If you would like counts instead of presence/absence then omit the pmin.)

pmin(table(mydata), 1)
##    medoc
##  id  cur mor par sed
##   1   0   1   1   1
##   2   1   0   0   1

or as a data frame and adding medoc_str

library(dplyr)
library(tibble)

mydata %>%
  table %>%
  pmin(1) %>%
  as.data.frame.matrix %>%
  rowwise %>%
  mutate(medoc_str = paste(names(.)[c_across() == 1], collapse = " ")) %>%
  ungroup %>%  
  rownames_to_column(var = "id")
## # A tibble: 2 x 6
##   id      cur   mor   par   sed medoc_str  
##   <chr> <dbl> <dbl> <dbl> <dbl> <chr>      
## 1 1         0     1     1     1 mor par sed
## 2 2         1     0     0     1 cur sed    

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM