简体   繁体   中英

How to construct a function for creating dummy variables?

I have a data frame that gives the following output to create dummy variables.

library(dummies)
df1 <- data.frame(id = 1:4, year = 1991:1994)
df1 <- cbind(df1, dummy(df1$year, sep = "_"))
df1
#    id year df1_1991 df1_1992 df1_1993 df1_1994
#1  1 1991        1        0        0        0
#2  2 1992        0        1        0        0
#3  3 1993        0        0        1        0
#4  4 1994        0        0        0        1

I have to tried to create a functional programming to achieve the same.

dummy_df <- function(dframe, x){
    dframe <- cbind(dframe, dummy(dframe$x, sep = "_"))
    return(dframe)
}

However when I run the output, I am getting the following error.

dummy_df(df1, year)
#Error in `[[.default`(x, 1) : subscript out of bounds

How to rectify this mistake and create an automatic function for creating dummy variables? Additionally, it would better if the function provides the option of whether to keep or discard the initial column that is being separated to create the dummy variables. For eg, in case of the above data frame, the option to keep or discard should be applied to column year .

This question has been posted after observing a similar question here. Pass a data.frame column name to a function

The problem is that when year is passed unquoted, it is a symbol representing a variable, not a string, a variable name. A standard trick to get a character string is the use of deparse(substitute(.)) . Then the extractor [[ works.

dummy_df <- function(dframe, x){
    x <- deparse(substitute(x))
    dframe <- cbind(dframe, dummy(dframe[[x]], sep = "_"))
    return(dframe)
}

dummy_df(df1, year)
#  id year df1_1991 df1_1992 df1_1993 df1_1994
#1  1 1991        1        0        0        0
#2  2 1992        0        1        0        0
#3  3 1993        0        0        1        0
#4  4 1994        0        0        0        1
#Warning message:
#In model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE) :
#  non-list contrasts argument ignored

If the column x can be passed quoted, change the function above to as.character(substitute(.)) . The function will accept both quoted and unquoted x .

dummy_df <- function(dframe, x){
    x <- as.character(substitute(x))
    dframe <- cbind(dframe, dummy(dframe[[x]], sep = "_"))
    return(dframe)
}

dummy_df(df1, year)
dummy_df(df1, "year")

Edit

Following a OP's comment , to keep or remove the column x can be solved with an extra function argument, keep , defaulting to TRUE .

dummy_df <- function(dframe, x, keep = TRUE){
    x <- as.character(substitute(x))
    if(keep){
        dftmp <- dframe
    } else {
        i <- grep(x, names(dframe))
        if(length(i) == 0) stop(paste(sQuote(x), "is not a valid column"))
        dftmp <- dframe[-i]
    }
    dframe <- cbind(dftmp, dummy(dframe[[x]], sep = "_"))
    return(dframe)
}

dummy_df(df1, year)
dummy_df(df1, "year")

dummy_df(df1, year, keep = FALSE)
dummy_df(df1, month, keep = FALSE)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM