简体   繁体   English

我正在寻找一种更改虚拟变量中的变量类别的最佳方法

[英]I am searching an optimal way to change variable categories in dummy variables

I have some patients who receive different treatments at different times.我有一些患者在不同的时间接受不同的治疗。 I want to change the treatment they have received into a binary variable that takes the value of 1 if the patient has received the drug at least once and the value of 0 if they have never received it.我想将他们接受的治疗更改为二进制变量,如果患者至少接受过一次药物,则取值为 1,如果他们从未接受过药物,则取值为 0。

I managed to do this but in a tedious way, which could be difficult with dozens of different types of drugs.我设法做到了这一点,但是以一种乏味的方式,这对于数十种不同类型的药物来说可能很困难。

I would like to optimize my code which mainly avoids creating all the binary variables related to the drug one by one.我想优化我的代码,主要是避免一一创建与药物相关的所有二进制变量。

id<-c(rep(1,7),rep(2,4))
medoc<-c("par","mor","mor","par","sed","sed",
         "sed","cur","sed","cur","sed")

mydata<-data.frame(id,medoc)

mydata2<-mydata%>%group_by(id)%>%
  mutate(medoc_str=paste(unique(medoc),collapse  = " "))%>%
  distinct(id,.keep_all = TRUE)

mydata2$par<-NA
mydata2$mor<-NA
mydata2$sed<-NA
mydata2$cur<-NA

mydata2$par<-ifelse(
  grepl("par",mydata2$medoc_str)==TRUE,1,0
)

mydata2$mor<-ifelse(
  grepl("mor",mydata2$medoc_str)==TRUE,1,0
)

mydata2$sed<-ifelse(
  grepl("sed",mydata2$medoc_str)==TRUE,1,0
)

mydata2$cur<-ifelse(
  grepl("cur",mydata2$medoc_str)==TRUE,1,0
)

If I understand it, you want to dummify your variables.如果我理解它,你想使你的变量变得愚蠢。 We can do it with tidyr::pivot_wider too, but I really like to use specific libraries to do it very easily.我们也可以使用tidyr::pivot_wider来做到这一点,但我真的很喜欢使用特定的库来轻松地做到这一点。 I like the fastDummies package:我喜欢fastDummies package:

library(fastDummies)

dummy_cols(mydata, select_columns = 'medoc')

   id medoc medoc_cur medoc_mor medoc_par medoc_sed
1   1   par         0         0         1         0
2   1   mor         0         1         0         0
3   1   mor         0         1         0         0
4   1   par         0         0         1         0
5   1   sed         0         0         0         1
6   1   sed         0         0         0         1
7   1   sed         0         0         0         1
8   2   cur         1         0         0         0
9   2   sed         0         0         0         1
10  2   cur         1         0         0         0
11  2   sed         0         0         0         1

And here is an answer with pivot_wider :这是pivot_wider的答案:

library(tidyr)
library(dplyr)
mydata %>% mutate(index = row_number()) %>%
  pivot_wider(names_from = medoc,
              values_from = medoc,
              values_fn = \(x) +!is.na(x),
              values_fill = 0)

A solution similar to @Guedes's but with different values_fn :与@Guedes 类似但具有不同values_fn的解决方案:

library(dplyr)
library(tidyr)

mydata %>%
  mutate(row = row_number()) %>%
  pivot_wider(names_from = medoc, values_from = medoc,
              values_fn = function(x) 1, values_fill = 0) %>%
  select(-row)
# A tibble: 11 x 5
      id   par   mor   sed   cur
   <dbl> <dbl> <dbl> <dbl> <dbl>
 1     1     1     0     0     0
 2     1     0     1     0     0
 3     1     0     1     0     0
 4     1     1     0     0     0
 5     1     0     0     1     0
 6     1     0     0     1     0
 7     1     0     0     1     0
 8     2     0     0     0     1
 9     2     0     0     1     0
10     2     0     0     0     1
11     2     0     0     1     0

Assuming that you want one row for each id with binary columns indicating which values of medoc are present (1) or absent (0) we can use table like this.假设您希望每个 id 有一行,其中二进制列指示 medoc 的哪些值存在 (1) 或不存在 (0),我们可以使用这样的表。 (If you would like counts instead of presence/absence then omit the pmin.) (如果您想要计数而不是存在/不存在,则省略 pmin。)

pmin(table(mydata), 1)
##    medoc
##  id  cur mor par sed
##   1   0   1   1   1
##   2   1   0   0   1

or as a data frame and adding medoc_str或作为数据框并添加 medoc_str

library(dplyr)
library(tibble)

mydata %>%
  table %>%
  pmin(1) %>%
  as.data.frame.matrix %>%
  rowwise %>%
  mutate(medoc_str = paste(names(.)[c_across() == 1], collapse = " ")) %>%
  ungroup %>%  
  rownames_to_column(var = "id")
## # A tibble: 2 x 6
##   id      cur   mor   par   sed medoc_str  
##   <chr> <dbl> <dbl> <dbl> <dbl> <chr>      
## 1 1         0     1     1     1 mor par sed
## 2 2         1     0     0     1 cur sed    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM