简体   繁体   English

如何构造一个 function 来创建虚拟变量?

[英]How to construct a function for creating dummy variables?

I have a data frame that gives the following output to create dummy variables.我有一个数据框,它提供以下 output 来创建虚拟变量。

library(dummies)
df1 <- data.frame(id = 1:4, year = 1991:1994)
df1 <- cbind(df1, dummy(df1$year, sep = "_"))
df1
#    id year df1_1991 df1_1992 df1_1993 df1_1994
#1  1 1991        1        0        0        0
#2  2 1992        0        1        0        0
#3  3 1993        0        0        1        0
#4  4 1994        0        0        0        1

I have to tried to create a functional programming to achieve the same.我必须尝试创建一个函数式编程来实现相同的目标。

dummy_df <- function(dframe, x){
    dframe <- cbind(dframe, dummy(dframe$x, sep = "_"))
    return(dframe)
}

However when I run the output, I am getting the following error.但是,当我运行 output 时,出现以下错误。

dummy_df(df1, year)
#Error in `[[.default`(x, 1) : subscript out of bounds

How to rectify this mistake and create an automatic function for creating dummy variables?如何纠正这个错误并创建一个自动 function 来创建虚拟变量? Additionally, it would better if the function provides the option of whether to keep or discard the initial column that is being separated to create the dummy variables.此外,如果 function 提供是否保留或丢弃正在分离的初始列以创建虚拟变量的选项会更好。 For eg, in case of the above data frame, the option to keep or discard should be applied to column year .例如,在上述数据框的情况下,应将保留或丢弃选项应用于列year

This question has been posted after observing a similar question here.这个问题是在观察了一个类似的问题后发布的。 Pass a data.frame column name to a function 将 data.frame 列名称传递给 function

The problem is that when year is passed unquoted, it is a symbol representing a variable, not a string, a variable name.问题是当year不带引号传递时,它是一个代表变量的符号,而不是字符串,一个变量名。 A standard trick to get a character string is the use of deparse(substitute(.)) .获取字符串的标准技巧是使用deparse(substitute(.)) Then the extractor [[ works.然后提取器[[工作。

dummy_df <- function(dframe, x){
    x <- deparse(substitute(x))
    dframe <- cbind(dframe, dummy(dframe[[x]], sep = "_"))
    return(dframe)
}

dummy_df(df1, year)
#  id year df1_1991 df1_1992 df1_1993 df1_1994
#1  1 1991        1        0        0        0
#2  2 1992        0        1        0        0
#3  3 1993        0        0        1        0
#4  4 1994        0        0        0        1
#Warning message:
#In model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE) :
#  non-list contrasts argument ignored

If the column x can be passed quoted, change the function above to as.character(substitute(.)) .如果可以通过引用列x ,请将上面的 function 更改为as.character(substitute(.)) The function will accept both quoted and unquoted x . function 将接受带引号和不带引号的x

dummy_df <- function(dframe, x){
    x <- as.character(substitute(x))
    dframe <- cbind(dframe, dummy(dframe[[x]], sep = "_"))
    return(dframe)
}

dummy_df(df1, year)
dummy_df(df1, "year")

Edit编辑

Following a OP's comment , to keep or remove the column x can be solved with an extra function argument, keep , defaulting to TRUE .OP 的评论之后,保留或删除列x可以通过额外的 function 参数来解决, keep ,默认为TRUE

dummy_df <- function(dframe, x, keep = TRUE){
    x <- as.character(substitute(x))
    if(keep){
        dftmp <- dframe
    } else {
        i <- grep(x, names(dframe))
        if(length(i) == 0) stop(paste(sQuote(x), "is not a valid column"))
        dftmp <- dframe[-i]
    }
    dframe <- cbind(dftmp, dummy(dframe[[x]], sep = "_"))
    return(dframe)
}

dummy_df(df1, year)
dummy_df(df1, "year")

dummy_df(df1, year, keep = FALSE)
dummy_df(df1, month, keep = FALSE)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM