I would like to create dummies which identifies one specific enterprise in var "empresa". For example, create a new variable "GLO" which assumes the value 1 if the variable "empresa" assumes the value "GLO" and 0 otherwise.
The data structure is as follows:
head(tarifas)
ano mes empresa origem destino tarifa assentos
1 2002 1 GLO SBPA SBBR 397,00 51
2 2002 1 AZU SBSV SBRF 272,00 5
3 2002 1 GLO SBFL SBGL 223,00 196
4 2002 1 TAM SBGL SBSP 96,00 615
5 2002 1 GLO SBGL SBRF 340,00 297
6 2002 1 AVI SBSP SBFL 145,00 189
I tried to use dplyr and for loop together, but something went wrong. For example, to create an identificator for the enterprises GLO and AZU, I used the following code:
for (k in c("GLO", "AZU")) {
tarifas2<- tarifas %>%
mutate(paste0(k) = 0) %>%
mutate(replace(paste0(k), empresa == paste0(",k,"),1))
}
I tried to use the following code:
tarifas<- cbind(tarifas,model.matrix( ~ 0 + empresa, tarifas))
but as I am working with big data I got a memory issue and I wouldn't like to create one dummy to each different observation in variable "empresa", but create just for some of the enterprises in variable "empresa".
The expected output is something like
ano mes empresa origem destino tarifa assentos GLO AZU
1 2002 1 GLO SBPA SBBR 397,00 51 1 0
2 2002 1 AZU SBSV SBRF 272,00 5 0 1
3 2002 1 GLO SBFL SBGL 223,00 196 1 0
4 2002 1 TAM SBGL SBSP 96,00 615 0 0
5 2002 1 GLO SBGL SBRF 340,00 297 1 0
6 2002 1 AVI SBSP SBFL 145,00 189 0 0
Thank you in advance.
If we want to create new columns in a loop
library(dplyr)
for (k in c("GLO", "AZU")) {
tarifas <- tarifas %>%
mutate(!! k := as.integer(empresa == k))
}
tarifas
# ano mes empresa origem destino tarifa assentos GLO AZU
#1 2002 1 GLO SBPA SBBR 397,00 51 1 0
#2 2002 1 AZU SBSV SBRF 272,00 5 0 1
#3 2002 1 GLO SBFL SBGL 223,00 196 1 0
#4 2002 1 TAM SBGL SBSP 96,00 615 0 0
#5 2002 1 GLO SBGL SBRF 340,00 297 1 0
#6 2002 1 AVI SBSP SBFL 145,00 189 0 0
However, we can also create this without a loop with pivot_wider
library(tidyr)
tarifas %>%
mutate(rn = row_number(), val = 1) %>%
pivot_wider(names_from = empresa,
values_from = val, values_fill = list(val = 0)))
# A tibble: 6 x 11
# ano mes origem destino tarifa assentos rn GLO AZU TAM AVI
# <int> <int> <chr> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
#1 2002 1 SBPA SBBR 397,00 51 1 1 0 0 0
#2 2002 1 SBSV SBRF 272,00 5 2 0 1 0 0
#3 2002 1 SBFL SBGL 223,00 196 3 1 0 0 0
#4 2002 1 SBGL SBSP 96,00 615 4 0 0 1 0
#5 2002 1 SBGL SBRF 340,00 297 5 1 0 0 0
#6 2002 1 SBSP SBFL 145,00 189 6 0 0 0 1
tarifas <- structure(list(ano = c(2002L, 2002L, 2002L, 2002L, 2002L, 2002L
), mes = c(1L, 1L, 1L, 1L, 1L, 1L), empresa = c("GLO", "AZU",
"GLO", "TAM", "GLO", "AVI"), origem = c("SBPA", "SBSV", "SBFL",
"SBGL", "SBGL", "SBSP"), destino = c("SBBR", "SBRF", "SBGL",
"SBSP", "SBRF", "SBFL"), tarifa = c("397,00", "272,00", "223,00",
"96,00", "340,00", "145,00"), assentos = c(51L, 5L, 196L, 615L,
297L, 189L)), class = "data.frame", row.names = c("1", "2", "3",
"4", "5", "6"))
Another option is dplyr::case_when()
tarifas <- tarifas %>%
mutate(GLO = case_when(
empresa == 'GLO' ~ 1,
empresa != 'GLO' ~ 0),
AZU = case_when(
empresa == 'AZU' ~ 1,
empresa != 'AZU' ~ 0)
)
Simply choose the values from empresa you want to create a column for.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.