简体   繁体   English

在`dplyr`中为新列/变量使用动态名称

[英]Use dynamic name for new column/variable in `dplyr`

I want to use dplyr::mutate() to create multiple new columns in a data frame.我想使用dplyr::mutate()在数据框中创建多个新列。 The column names and their contents should be dynamically generated.列名及其内容应该是动态生成的。

Example data from iris:来自 iris 的示例数据:

library(dplyr)
iris <- as_tibble(iris)

I've created a function to mutate my new columns from the Petal.Width variable:我创建了一个 function 从Petal.Width变量中改变我的新列:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df <- mutate(df, varname = Petal.Width * n)  ## problem arises here
    df
}

Now I create a loop to build my columns:现在我创建一个循环来构建我的列:

for(i in 2:5) {
    iris <- multipetal(df=iris, n=i)
}

However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5).然而,由于 mutate 认为 varname 是一个字面变量名,因此循环只创建一个新变量(称为 varname)而不是四个(称为petal.2 - petal.5)。

How can I get mutate() to use my dynamic name as variable name?如何让mutate()使用我的动态名称作为变量名?

Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names.由于您将变量名动态构建为字符值,因此使用标准 data.frame 索引进行赋值更有意义,该索引允许列名的字符值。 For example:例如:

multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    df[[varname]] <- with(df, Petal.Width * n)
    df
}

The mutate function makes it very easy to name new columns via named parameters. mutate函数使通过命名参数命名新列变得非常容易。 But that assumes you know the name when you type the command.但这假设您在键入命令时知道名称。 If you want to dynamically specify the column name, then you need to also build the named argument.如果要动态指定列名,则还需要构建命名参数。


dplyr version >= 1.0 dplyr 版本 >= 1.0

With the latest dplyr version you can use the syntax from the glue package when naming parameters when using := .使用最新的 dplyr 版本,您可以在使用:=命名参数时使用glue包中的语法。 So here the {} in the name grab the value by evaluating the expression inside.因此,此处名称中的{}通过评估内部的表达式来获取值。

multipetal <- function(df, n) {
  mutate(df, "petal.{n}" := Petal.Width * n)
}

If you are passing a column name to your function, you can use {{}} in the string as well as for the column name如果您将列名传递给函数,则可以在字符串和列名中使用{{}}

meanofcol <- function(df, col) {
  mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)


dplyr version >= 0.7 dplyr 版本 >= 0.7

dplyr starting with version 0.7 allows you to use := to dynamically assign parameter names.从 0.7 版开始的dplyr允许您使用:=动态分配参数名称。 You can write your function as:您可以将函数编写为:

# --- dplyr version 0.7+---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    mutate(df, !!varname := Petal.Width * n)
}

For more information, see the documentation available form vignette("programming", "dplyr") .有关更多信息,请参阅vignette("programming", "dplyr")形式的可用文档。


dplyr (>=0.3 & <0.7) dplyr (>=0.3 & <0.7)

Slightly earlier version of dplyr (>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. dplyr稍早版本(>=0.3 <0.7)鼓励对许多函数使用“标准评估”替代方案。 See the Non-standard evaluation vignette for more information ( vignette("nse") ).有关更多信息,请参阅非标准评估小插图 ( vignette("nse") )。

So here, the answer is to use mutate_() rather than mutate() and do:所以在这里,答案是使用mutate_()而不是mutate()并执行:

# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    varval <- lazyeval::interp(~Petal.Width * n, n=n)
    mutate_(df, .dots= setNames(list(varval), varname))
}

dplyr < 0.3 dplyr < 0.3

Note this is also possible in older versions of dplyr that existed when the question was originally posed.请注意,这在最初提出问题时存在的旧版本dplyr中也是可能的。 It requires careful use of quote and setName :它需要小心使用quotesetName

# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
    varname <- paste("petal", n , sep=".")
    pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
    do.call("mutate", pp)
}

In the new release of dplyr ( 0.6.0 awaiting in April 2017), we can also do an assignment ( := ) and pass variables as column names by unquoting ( !! ) to not evaluate itdplyr的新版本(2017 年 4 月等待中的0.6.0 )中,我们还可以进行赋值( := )并通过取消引用( !! )将变量作为列名传递以不对其进行评估

 library(dplyr)
 multipetalN <- function(df, n){
      varname <- paste0("petal.", n)
      df %>%
         mutate(!!varname := Petal.Width * n)
 }

 data(iris)
 iris1 <- tbl_df(iris)
 iris2 <- tbl_df(iris)
 for(i in 2:5) {
     iris2 <- multipetalN(df=iris2, n=i)
 }   

Checking the output based on @MrFlick's multipetal applied on 'iris1'根据应用于“iris1”的@MrFlick 的multipetal检查输出

identical(iris1, iris2)
#[1] TRUE

After a lot of trial and error, I found the pattern UQ(rlang::sym("some string here"))) really useful for working with strings and dplyr verbs.经过大量的反复试验,我发现模式UQ(rlang::sym("some string here")))对于处理字符串和 dplyr 动词非常有用。 It seems to work in a lot of surprising situations.它似乎适用于许多令人惊讶的情况。

Here's an example with mutate .这是mutate的示例。 We want to create a function that adds together two columns, where you pass the function both column names as strings.我们想要创建一个将两列相加的函数,您可以在其中将两个列名作为字符串传递给函数。 We can use this pattern, together with the assignment operator := , to do this.我们可以将此模式与赋值运算符:=一起使用来执行此操作。

## Take column `name1`, add it to column `name2`, and call the result `new_name`
mutate_values <- function(new_name, name1, name2){
  mtcars %>% 
    mutate(UQ(rlang::sym(new_name)) :=  UQ(rlang::sym(name1)) +  UQ(rlang::sym(name2)))
}
mutate_values('test', 'mpg', 'cyl')

The pattern works with other dplyr functions as well.该模式也适用于其他dplyr函数。 Here's filter :这是filter

## filter a column by a value 
filter_values <- function(name, value){
  mtcars %>% 
    filter(UQ(rlang::sym(name)) != value)
}
filter_values('gear', 4)

Or arrange :arrange

## transform a variable and then sort by it 
arrange_values <- function(name, transform){
  mtcars %>% 
    arrange(UQ(rlang::sym(name)) %>%  UQ(rlang::sym(transform)))
}
arrange_values('mpg', 'sin')

For select , you don't need to use the pattern.对于select ,您不需要使用模式。 Instead you can use !!相反,您可以使用!! :

## select a column 
select_name <- function(name){
  mtcars %>% 
    select(!!name)
}
select_name('mpg')

With rlang 0.4.0 we have curly-curly operators ( {{}} ) which makes this very easy.rlang 0.4.0我们有卷曲运算符 ( {{}} ),这使得这很容易。 When a dynamic column name shows up on the left-hand side of an assignment, use := .当动态列名称出现在赋值的左侧时,请使用:=

library(dplyr)
library(rlang)

iris1 <- tbl_df(iris)

multipetal <- function(df, n) {
   varname <- paste("petal", n , sep=".")
   mutate(df, {{varname}} := Petal.Width * n)
}

multipetal(iris1, 4)

# A tibble: 150 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.4
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>     <dbl>
# 1          5.1         3.5          1.4         0.2 setosa      0.8
# 2          4.9         3            1.4         0.2 setosa      0.8
# 3          4.7         3.2          1.3         0.2 setosa      0.8
# 4          4.6         3.1          1.5         0.2 setosa      0.8
# 5          5           3.6          1.4         0.2 setosa      0.8
# 6          5.4         3.9          1.7         0.4 setosa      1.6
# 7          4.6         3.4          1.4         0.3 setosa      1.2
# 8          5           3.4          1.5         0.2 setosa      0.8
# 9          4.4         2.9          1.4         0.2 setosa      0.8
#10          4.9         3.1          1.5         0.1 setosa      0.4
# … with 140 more rows

We can also pass quoted/unquoted variable names to be assigned as column names.我们还可以传递带引号/不带引号的变量名称以作为列名称分配。

multipetal <- function(df, name, n) {
   mutate(df, {{name}} := Petal.Width * n)
}

multipetal(iris1, temp, 3)

# A tibble: 150 x 6
#   Sepal.Length Sepal.Width Petal.Length Petal.Width Species  temp
#          <dbl>       <dbl>        <dbl>       <dbl> <fct>   <dbl>
# 1          5.1         3.5          1.4         0.2 setosa  0.6  
# 2          4.9         3            1.4         0.2 setosa  0.6  
# 3          4.7         3.2          1.3         0.2 setosa  0.6  
# 4          4.6         3.1          1.5         0.2 setosa  0.6  
# 5          5           3.6          1.4         0.2 setosa  0.6  
# 6          5.4         3.9          1.7         0.4 setosa  1.2  
# 7          4.6         3.4          1.4         0.3 setosa  0.900
# 8          5           3.4          1.5         0.2 setosa  0.6  
# 9          4.4         2.9          1.4         0.2 setosa  0.6  
#10          4.9         3.1          1.5         0.1 setosa  0.3  
# … with 140 more rows

It works the same with它与

multipetal(iris1, "temp", 3)

Here's another version, and it's arguably a bit simpler.这是另一个版本,可以说它更简单一些。

multipetal <- function(df, n) {
    varname <- paste("petal", n, sep=".")
    df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
    df
}

for(i in 2:5) {
    iris <- multipetal(df=iris, n=i)
}

> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1          5.1         3.5          1.4         0.2  setosa     0.4     0.6     0.8       1
2          4.9         3.0          1.4         0.2  setosa     0.4     0.6     0.8       1
3          4.7         3.2          1.3         0.2  setosa     0.4     0.6     0.8       1
4          4.6         3.1          1.5         0.2  setosa     0.4     0.6     0.8       1
5          5.0         3.6          1.4         0.2  setosa     0.4     0.6     0.8       1
6          5.4         3.9          1.7         0.4  setosa     0.8     1.2     1.6       2

I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via @MrFlik 's answer and the R lazyeval vignettes.我还添加了一个答案,稍微增加了这一点,因为我在寻找答案时来到了这个条目,这几乎满足了我的需求,但我还需要更多,这是我通过 @MrFlik 的答案和R 懒惰小插曲。

I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object.我想创建一个函数,它可以接受一个数据框和一个列名向量(作为字符串),我想将其从字符串转换为 Date 对象。 I couldn't figure out how to make as.Date() take an argument that is a string and convert it to a column, so I did it as shown below.我不知道如何让as.Date()接受一个字符串参数并将其转换为列,所以我按如下所示进行了操作。

Below is how I did this via SE mutate ( mutate_() ) and the .dots argument.下面是我如何通过 SE mutate ( mutate_() ) 和.dots参数来做到这一点的。 Criticisms that make this better are welcome.欢迎批评使这变得更好。

library(dplyr)

dat <- data.frame(a="leave alone",
                  dt="2015-08-03 00:00:00",
                  dt2="2015-01-20 00:00:00")

# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
    for (col in dtnames) {
        varval <- sprintf("as.Date(%s)", col)
        df <- df %>% mutate_(.dots= setNames(list(varval), col))
    }
    return(df)
}

dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str

You may enjoy package friendlyeval which presents a simplified tidy eval API and documentation for newer/casual dplyr users.您可能会喜欢包friendlyeval ,它为新/临时dplyr用户提供了一个简化的整洁的eval API和文档。

You are creating strings that you wish mutate to treat as column names.您正在创建您希望mutate处理为列名的字符串。 So using friendlyeval you could write:所以使用friendlyeval你可以写:

multipetal <- function(df, n) {
  varname <- paste("petal", n , sep=".")
  df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
  df
}

for(i in 2:5) {
  iris <- multipetal(df=iris, n=i)
}

Which under the hood calls rlang functions that check varname is legal as column name.rlang调用rlang函数来检查varname作为列名是否合法。

friendlyeval code can be converted to equivalent plain tidy eval code at any time with an RStudio addin. friendlyeval代码可以随时使用 RStudio 插件转换为等效的简单整洁的 eval 代码。

While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.虽然我喜欢使用 dplyr 进行交互式使用,但我发现使用 dplyr 执行此操作非常棘手,因为您必须通过箍来使用lazyeval::interp()、setNames 等变通方法。

Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends @MrFlicks's solution.这是一个使用基本 R 的更简单的版本,至少对我来说,将循环放在函数中似乎更直观,并且扩展了@MrFlicks 的解决方案。

multipetal <- function(df, n) {
   for (i in 1:n){
      varname <- paste("petal", i , sep=".")
      df[[varname]] <- with(df, Petal.Width * i)
   }
   df
}
multipetal(iris, 3) 

Another alternative: use {} inside quotation marks to easily create dynamic names.另一种选择:在引号内使用{}以轻松创建动态名称。 This is similar to other solutions but not exactly the same, and I find it easier.这与其他解决方案类似但不完全相同,我发现它更容易。

library(dplyr)
library(tibble)

iris <- as_tibble(iris)

multipetal <- function(df, n) {
  df <- mutate(df, "petal.{n}" := Petal.Width * n)  ## problem arises here
  df
}

for(i in 2:5) {
  iris <- multipetal(df=iris, n=i)
}
iris

I think this comes from dplyr 1.0.0 but not sure (I also have rlang 4.7.0 if it matters).我认为这来自dplyr 1.0.0但不确定(如果重要的话,我也有rlang 4.7.0 )。

If you need the same operation several times it usually tells you that your data format is not optimal.如果您多次需要相同的操作,它通常会告诉您您的数据格式不是最佳的。 You want a longer format with n being a column in the data.frame that can be achieved by a cross join:您需要更长的格式,其中n是 data.frame 中的一列,可以通过交叉连接实现:

library(tidyverse)
iris %>% mutate(identifier = 1:n()) %>% #necessary to disambiguate row 102 from row 143 (complete duplicates)
   full_join(tibble(n = 1:5), by=character()) %>% #cross join for long format
   mutate(petal = Petal.Width * n) %>% #calculation in long format
   pivot_wider(names_from=n, values_from=petal, names_prefix="petal.width.") #back to wider format (if desired)

Result:结果:

# A tibble: 150 x 11
   Sepal.Length Sepal.Width Petal.Length Petal.Width Species identifier petal.width.1 petal.width.2 petal.width.3
          <dbl>       <dbl>        <dbl>       <dbl> <fct>        <int>         <dbl>         <dbl>         <dbl>
 1          5.1         3.5          1.4         0.2 setosa           1           0.2           0.4           0.6
 2          4.9         3            1.4         0.2 setosa           2           0.2           0.4           0.6
 3          4.7         3.2          1.3         0.2 setosa           3           0.2           0.4           0.6
 4          4.6         3.1          1.5         0.2 setosa           4           0.2           0.4           0.6
 5          5           3.6          1.4         0.2 setosa           5           0.2           0.4           0.6
 6          5.4         3.9          1.7         0.4 setosa           6           0.4           0.8           1.2
 7          4.6         3.4          1.4         0.3 setosa           7           0.3           0.6           0.9
 8          5           3.4          1.5         0.2 setosa           8           0.2           0.4           0.6
 9          4.4         2.9          1.4         0.2 setosa           9           0.2           0.4           0.6
10          4.9         3.1          1.5         0.1 setosa          10           0.1           0.2           0.3
# ... with 140 more rows, and 2 more variables: petal.width.4 <dbl>, petal.width.5 <dbl>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM