简体   繁体   中英

Covert dummy variables to single categorical in R?

Similar questions have been asked here , here , and here . However, they don't seem to cover exactly what I need. For example, if I have a dataset like so:

df <- data.frame(
  x = rnorm(10),
  y = rnorm(10),
  a = c(0,0,0,1,1,0,0,0,1,0),
  b = c(1,1,1,1,0,0,1,0,0,0),
  c = c(0,1,0,1,0,0,0,0,0,0),
  z = c(1,1,1,1,1,0,1,0,1,0)
)

What I'm trying to do is convert the variables a , b , and c to a single categorical where the levels are a , b , and c . But as you can see, sometimes 2 variables occur in the same row. So, what I'm trying to achieve is a data frame that would look something like this:

df <- data.frame(
  x = rnorm(10),
  y = rnorm(10),
  a = c(0,0,0,1,1,0,0,0,1,0),
  b = c(1,1,1,1,0,0,1,0,0,0),
  c = c(0,1,0,1,0,0,0,0,0,0),
  z = c(“b”,“b,c”,“b”,“a,b,c”,“a”,0,“b”,0,“a”,0)
)

I tried using:

apply(df[,c("a","b", "c")], 1, sum, na.rm=TRUE)

which sums the amount of each variable... but I'm not sure how to combine 2 (or more) variables into a single factor level??

Any suggestions as to how I could do this?

Loop over the selected columns by row ( MARGIN = 1 ), subset the column names where the value is 1 and paste them together

df$z <-  apply(df[c('a', 'b', 'c')], 1, function(x) toString(names(x)[x ==1]))
df$z
#[1] "b"       "b, c"    "b"       "a, b, c" "a"       ""        "b"       ""        "a"       ""       

If we want to change the "" to '0'

df$z[df$z == ''] <- '0'

For a solution with purrr and dplyr:

df %>% mutate(z = pmap_chr(select(., a, b, c), ~  {v1 <- c(...); toString(names(v1)[v1 == 1])}))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM