How to create a group of dummy variables that identifies a specific value in another variable?

Question

I would like to create dummies which identifies one specific enterprise in var "empresa". For example, create a new variable "GLO" which assumes the value 1 if the variable "empresa" assumes the value "GLO" and 0 otherwise.

The data structure is as follows:

head(tarifas)


    ano mes empresa origem destino tarifa assentos
1 2002   1     GLO   SBPA    SBBR 397,00       51
2 2002   1     AZU   SBSV    SBRF 272,00        5
3 2002   1     GLO   SBFL    SBGL 223,00      196
4 2002   1     TAM   SBGL    SBSP  96,00      615
5 2002   1     GLO   SBGL    SBRF 340,00      297
6 2002   1     AVI   SBSP    SBFL 145,00      189

I tried to use dplyr and for loop together, but something went wrong. For example, to create an identificator for the enterprises GLO and AZU, I used the following code:

for (k in c("GLO", "AZU")) {
 tarifas2<- tarifas %>%
  mutate(paste0(k) = 0) %>%
  mutate(replace(paste0(k), empresa == paste0(",k,"),1))
}

I tried to use the following code:

tarifas<- cbind(tarifas,model.matrix( ~ 0 + empresa, tarifas))

but as I am working with big data I got a memory issue and I wouldn't like to create one dummy to each different observation in variable "empresa", but create just for some of the enterprises in variable "empresa".

The expected output is something like

        ano mes empresa origem destino tarifa assentos GLO AZU
1 2002   1     GLO   SBPA    SBBR 397,00       51      1   0
2 2002   1     AZU   SBSV    SBRF 272,00        5      0   1
3 2002   1     GLO   SBFL    SBGL 223,00      196      1   0
4 2002   1     TAM   SBGL    SBSP  96,00      615      0   0
5 2002   1     GLO   SBGL    SBRF 340,00      297      1   0
6 2002   1     AVI   SBSP    SBFL 145,00      189      0   0

Thank you in advance.

Answer 1

If we want to create new columns in a loop

library(dplyr)
for (k in c("GLO", "AZU")) {
     tarifas  <- tarifas %>%
                    mutate(!! k := as.integer(empresa  ==  k))
   }


tarifas
#   ano mes empresa origem destino tarifa assentos GLO AZU
#1 2002   1     GLO   SBPA    SBBR 397,00       51   1   0
#2 2002   1     AZU   SBSV    SBRF 272,00        5   0   1
#3 2002   1     GLO   SBFL    SBGL 223,00      196   1   0
#4 2002   1     TAM   SBGL    SBSP  96,00      615   0   0
#5 2002   1     GLO   SBGL    SBRF 340,00      297   1   0
#6 2002   1     AVI   SBSP    SBFL 145,00      189   0   0

However, we can also create this without a loop with pivot_wider

library(tidyr)
tarifas %>%
      mutate(rn = row_number(), val = 1) %>%
       pivot_wider(names_from = empresa, 
            values_from = val, values_fill = list(val = 0)))
# A tibble: 6 x 11
#    ano   mes origem destino tarifa assentos    rn   GLO   AZU   TAM   AVI
#  <int> <int> <chr>  <chr>   <chr>     <int> <int> <dbl> <dbl> <dbl> <dbl>
#1  2002     1 SBPA   SBBR    397,00       51     1     1     0     0     0
#2  2002     1 SBSV   SBRF    272,00        5     2     0     1     0     0
#3  2002     1 SBFL   SBGL    223,00      196     3     1     0     0     0
#4  2002     1 SBGL   SBSP    96,00       615     4     0     0     1     0
#5  2002     1 SBGL   SBRF    340,00      297     5     1     0     0     0
#6  2002     1 SBSP   SBFL    145,00      189     6     0     0     0     1

data

tarifas <- structure(list(ano = c(2002L, 2002L, 2002L, 2002L, 2002L, 2002L
), mes = c(1L, 1L, 1L, 1L, 1L, 1L), empresa = c("GLO", "AZU", 
"GLO", "TAM", "GLO", "AVI"), origem = c("SBPA", "SBSV", "SBFL", 
"SBGL", "SBGL", "SBSP"), destino = c("SBBR", "SBRF", "SBGL", 
"SBSP", "SBRF", "SBFL"), tarifa = c("397,00", "272,00", "223,00", 
"96,00", "340,00", "145,00"), assentos = c(51L, 5L, 196L, 615L, 
297L, 189L)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6"))

Answer 2

Another option is dplyr::case_when()

tarifas <- tarifas %>%
  mutate(GLO = case_when(
    empresa == 'GLO' ~ 1,
    empresa != 'GLO' ~ 0),
    AZU = case_when(
      empresa == 'AZU' ~ 1,
      empresa != 'AZU' ~ 0)
    )

Simply choose the values from empresa you want to create a column for.

How to create a group of dummy variables that identifies a specific value in another variable?

Question

2 answers

solution1
2 ACCPTED 2019-09-18 17:35:58

data

solution2
0 2019-09-18 17:45:39

How to create a group of dummy variables that identifies a specific value in another variable?

Question

2 answers

solution1 2 ACCPTED 2019-09-18 17:35:58

data

solution2 0 2019-09-18 17:45:39

solution1
2 ACCPTED 2019-09-18 17:35:58

solution2
0 2019-09-18 17:45:39