[英]Use dynamic name for new column/variable in `dplyr`
I want to use dplyr::mutate()
to create multiple new columns in a data frame.我想使用
dplyr::mutate()
在数据框中创建多个新列。 The column names and their contents should be dynamically generated.列名及其内容应该是动态生成的。
Example data from iris:来自 iris 的示例数据:
library(dplyr)
iris <- as_tibble(iris)
I've created a function to mutate my new columns from the Petal.Width
variable:我创建了一个 function 从
Petal.Width
变量中改变我的新列:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, varname = Petal.Width * n) ## problem arises here
df
}
Now I create a loop to build my columns:现在我创建一个循环来构建我的列:
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
However, since mutate thinks varname is a literal variable name, the loop only creates one new variable (called varname) instead of four (called petal.2 - petal.5).然而,由于 mutate 认为 varname 是一个字面变量名,因此循环只创建一个新变量(称为 varname)而不是四个(称为petal.2 - petal.5)。
How can I get mutate()
to use my dynamic name as variable name?如何让
mutate()
使用我的动态名称作为变量名?
Since you are dynamically building a variable name as a character value, it makes more sense to do assignment using standard data.frame indexing which allows for character values for column names.由于您将变量名动态构建为字符值,因此使用标准 data.frame 索引进行赋值更有意义,该索引允许列名的字符值。 For example:
例如:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df[[varname]] <- with(df, Petal.Width * n)
df
}
The mutate
function makes it very easy to name new columns via named parameters. mutate
函数使通过命名参数命名新列变得非常容易。 But that assumes you know the name when you type the command.但这假设您在键入命令时知道名称。 If you want to dynamically specify the column name, then you need to also build the named argument.
如果要动态指定列名,则还需要构建命名参数。
With the latest dplyr version you can use the syntax from the glue
package when naming parameters when using :=
.使用最新的 dplyr 版本,您可以在使用
:=
命名参数时使用glue
包中的语法。 So here the {}
in the name grab the value by evaluating the expression inside.因此,此处名称中的
{}
通过评估内部的表达式来获取值。
multipetal <- function(df, n) {
mutate(df, "petal.{n}" := Petal.Width * n)
}
If you are passing a column name to your function, you can use {{}}
in the string as well as for the column name如果您将列名传递给函数,则可以在字符串和列名中使用
{{}}
meanofcol <- function(df, col) {
mutate(df, "Mean of {{col}}" := mean({{col}}))
}
meanofcol(iris, Petal.Width)
dplyr
starting with version 0.7 allows you to use :=
to dynamically assign parameter names.从 0.7 版开始的
dplyr
允许您使用:=
动态分配参数名称。 You can write your function as:您可以将函数编写为:
# --- dplyr version 0.7+---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, !!varname := Petal.Width * n)
}
For more information, see the documentation available form vignette("programming", "dplyr")
.有关更多信息,请参阅
vignette("programming", "dplyr")
形式的可用文档。
Slightly earlier version of dplyr
(>=0.3 <0.7), encouraged the use of "standard evaluation" alternatives to many of the functions. dplyr
稍早版本(>=0.3 <0.7)鼓励对许多函数使用“标准评估”替代方案。 See the Non-standard evaluation vignette for more information ( vignette("nse")
).有关更多信息,请参阅非标准评估小插图 (
vignette("nse")
)。
So here, the answer is to use mutate_()
rather than mutate()
and do:所以在这里,答案是使用
mutate_()
而不是mutate()
并执行:
# --- dplyr version 0.3-0.5---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
varval <- lazyeval::interp(~Petal.Width * n, n=n)
mutate_(df, .dots= setNames(list(varval), varname))
}
Note this is also possible in older versions of dplyr
that existed when the question was originally posed.请注意,这在最初提出问题时存在的旧版本
dplyr
中也是可能的。 It requires careful use of quote
and setName
:它需要小心使用
quote
和setName
:
# --- dplyr versions < 0.3 ---
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
pp <- c(quote(df), setNames(list(quote(Petal.Width * n)), varname))
do.call("mutate", pp)
}
In the new release of dplyr
( 0.6.0
awaiting in April 2017), we can also do an assignment ( :=
) and pass variables as column names by unquoting ( !!
) to not evaluate it在
dplyr
的新版本(2017 年 4 月等待中的0.6.0
)中,我们还可以进行赋值( :=
)并通过取消引用( !!
)将变量作为列名传递以不对其进行评估
library(dplyr)
multipetalN <- function(df, n){
varname <- paste0("petal.", n)
df %>%
mutate(!!varname := Petal.Width * n)
}
data(iris)
iris1 <- tbl_df(iris)
iris2 <- tbl_df(iris)
for(i in 2:5) {
iris2 <- multipetalN(df=iris2, n=i)
}
Checking the output based on @MrFlick's multipetal
applied on 'iris1'根据应用于“iris1”的@MrFlick 的
multipetal
检查输出
identical(iris1, iris2)
#[1] TRUE
After a lot of trial and error, I found the pattern UQ(rlang::sym("some string here")))
really useful for working with strings and dplyr verbs.经过大量的反复试验,我发现模式
UQ(rlang::sym("some string here")))
对于处理字符串和 dplyr 动词非常有用。 It seems to work in a lot of surprising situations.它似乎适用于许多令人惊讶的情况。
Here's an example with mutate
.这是
mutate
的示例。 We want to create a function that adds together two columns, where you pass the function both column names as strings.我们想要创建一个将两列相加的函数,您可以在其中将两个列名作为字符串传递给函数。 We can use this pattern, together with the assignment operator
:=
, to do this.我们可以将此模式与赋值运算符
:=
一起使用来执行此操作。
## Take column `name1`, add it to column `name2`, and call the result `new_name`
mutate_values <- function(new_name, name1, name2){
mtcars %>%
mutate(UQ(rlang::sym(new_name)) := UQ(rlang::sym(name1)) + UQ(rlang::sym(name2)))
}
mutate_values('test', 'mpg', 'cyl')
The pattern works with other dplyr
functions as well.该模式也适用于其他
dplyr
函数。 Here's filter
:这是
filter
:
## filter a column by a value
filter_values <- function(name, value){
mtcars %>%
filter(UQ(rlang::sym(name)) != value)
}
filter_values('gear', 4)
Or arrange
:或
arrange
:
## transform a variable and then sort by it
arrange_values <- function(name, transform){
mtcars %>%
arrange(UQ(rlang::sym(name)) %>% UQ(rlang::sym(transform)))
}
arrange_values('mpg', 'sin')
For select
, you don't need to use the pattern.对于
select
,您不需要使用模式。 Instead you can use !!
相反,您可以使用
!!
: :
## select a column
select_name <- function(name){
mtcars %>%
select(!!name)
}
select_name('mpg')
With rlang 0.4.0
we have curly-curly operators ( {{}}
) which makes this very easy.在
rlang 0.4.0
我们有卷曲运算符 ( {{}}
),这使得这很容易。 When a dynamic column name shows up on the left-hand side of an assignment, use :=
.当动态列名称出现在赋值的左侧时,请使用
:=
。
library(dplyr)
library(rlang)
iris1 <- tbl_df(iris)
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
mutate(df, {{varname}} := Petal.Width * n)
}
multipetal(iris1, 4)
# A tibble: 150 x 6
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.4
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa 0.8
# 2 4.9 3 1.4 0.2 setosa 0.8
# 3 4.7 3.2 1.3 0.2 setosa 0.8
# 4 4.6 3.1 1.5 0.2 setosa 0.8
# 5 5 3.6 1.4 0.2 setosa 0.8
# 6 5.4 3.9 1.7 0.4 setosa 1.6
# 7 4.6 3.4 1.4 0.3 setosa 1.2
# 8 5 3.4 1.5 0.2 setosa 0.8
# 9 4.4 2.9 1.4 0.2 setosa 0.8
#10 4.9 3.1 1.5 0.1 setosa 0.4
# … with 140 more rows
We can also pass quoted/unquoted variable names to be assigned as column names.我们还可以传递带引号/不带引号的变量名称以作为列名称分配。
multipetal <- function(df, name, n) {
mutate(df, {{name}} := Petal.Width * n)
}
multipetal(iris1, temp, 3)
# A tibble: 150 x 6
# Sepal.Length Sepal.Width Petal.Length Petal.Width Species temp
# <dbl> <dbl> <dbl> <dbl> <fct> <dbl>
# 1 5.1 3.5 1.4 0.2 setosa 0.6
# 2 4.9 3 1.4 0.2 setosa 0.6
# 3 4.7 3.2 1.3 0.2 setosa 0.6
# 4 4.6 3.1 1.5 0.2 setosa 0.6
# 5 5 3.6 1.4 0.2 setosa 0.6
# 6 5.4 3.9 1.7 0.4 setosa 1.2
# 7 4.6 3.4 1.4 0.3 setosa 0.900
# 8 5 3.4 1.5 0.2 setosa 0.6
# 9 4.4 2.9 1.4 0.2 setosa 0.6
#10 4.9 3.1 1.5 0.1 setosa 0.3
# … with 140 more rows
It works the same with它与
multipetal(iris1, "temp", 3)
Here's another version, and it's arguably a bit simpler.这是另一个版本,可以说它更简单一些。
multipetal <- function(df, n) {
varname <- paste("petal", n, sep=".")
df<-mutate_(df, .dots=setNames(paste0("Petal.Width*",n), varname))
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species petal.2 petal.3 petal.4 petal.5
1 5.1 3.5 1.4 0.2 setosa 0.4 0.6 0.8 1
2 4.9 3.0 1.4 0.2 setosa 0.4 0.6 0.8 1
3 4.7 3.2 1.3 0.2 setosa 0.4 0.6 0.8 1
4 4.6 3.1 1.5 0.2 setosa 0.4 0.6 0.8 1
5 5.0 3.6 1.4 0.2 setosa 0.4 0.6 0.8 1
6 5.4 3.9 1.7 0.4 setosa 0.8 1.2 1.6 2
I am also adding an answer that augments this a little bit because I came to this entry when searching for an answer, and this had almost what I needed, but I needed a bit more, which I got via @MrFlik 's answer and the R lazyeval vignettes.我还添加了一个答案,稍微增加了这一点,因为我在寻找答案时来到了这个条目,这几乎满足了我的需求,但我还需要更多,这是我通过 @MrFlik 的答案和R 懒惰小插曲。
I wanted to make a function that could take a dataframe and a vector of column names (as strings) that I want to be converted from a string to a Date object.我想创建一个函数,它可以接受一个数据框和一个列名向量(作为字符串),我想将其从字符串转换为 Date 对象。 I couldn't figure out how to make
as.Date()
take an argument that is a string and convert it to a column, so I did it as shown below.我不知道如何让
as.Date()
接受一个字符串参数并将其转换为列,所以我按如下所示进行了操作。
Below is how I did this via SE mutate ( mutate_()
) and the .dots
argument.下面是我如何通过 SE mutate (
mutate_()
) 和.dots
参数来做到这一点的。 Criticisms that make this better are welcome.欢迎批评使这变得更好。
library(dplyr)
dat <- data.frame(a="leave alone",
dt="2015-08-03 00:00:00",
dt2="2015-01-20 00:00:00")
# This function takes a dataframe and list of column names
# that have strings that need to be
# converted to dates in the data frame
convertSelectDates <- function(df, dtnames=character(0)) {
for (col in dtnames) {
varval <- sprintf("as.Date(%s)", col)
df <- df %>% mutate_(.dots= setNames(list(varval), col))
}
return(df)
}
dat <- convertSelectDates(dat, c("dt", "dt2"))
dat %>% str
You may enjoy package friendlyeval
which presents a simplified tidy eval API and documentation for newer/casual dplyr
users.您可能会喜欢包
friendlyeval
,它为新/临时dplyr
用户提供了一个简化的整洁的eval API和文档。
You are creating strings that you wish mutate
to treat as column names.您正在创建您希望
mutate
处理为列名的字符串。 So using friendlyeval
you could write:所以使用
friendlyeval
你可以写:
multipetal <- function(df, n) {
varname <- paste("petal", n , sep=".")
df <- mutate(df, !!treat_string_as_col(varname) := Petal.Width * n)
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
Which under the hood calls rlang
functions that check varname
is legal as column name.在
rlang
调用rlang
函数来检查varname
作为列名是否合法。
friendlyeval
code can be converted to equivalent plain tidy eval code at any time with an RStudio addin. friendlyeval
代码可以随时使用 RStudio 插件转换为等效的简单整洁的 eval 代码。
While I enjoy using dplyr for interactive use, I find it extraordinarily tricky to do this using dplyr because you have to go through hoops to use lazyeval::interp(), setNames, etc. workarounds.虽然我喜欢使用 dplyr 进行交互式使用,但我发现使用 dplyr 执行此操作非常棘手,因为您必须通过箍来使用lazyeval::interp()、setNames 等变通方法。
Here is a simpler version using base R, in which it seems more intuitive, to me at least, to put the loop inside the function, and which extends @MrFlicks's solution.这是一个使用基本 R 的更简单的版本,至少对我来说,将循环放在函数中似乎更直观,并且扩展了@MrFlicks 的解决方案。
multipetal <- function(df, n) {
for (i in 1:n){
varname <- paste("petal", i , sep=".")
df[[varname]] <- with(df, Petal.Width * i)
}
df
}
multipetal(iris, 3)
Another alternative: use {}
inside quotation marks to easily create dynamic names.另一种选择:在引号内使用
{}
以轻松创建动态名称。 This is similar to other solutions but not exactly the same, and I find it easier.这与其他解决方案类似但不完全相同,我发现它更容易。
library(dplyr)
library(tibble)
iris <- as_tibble(iris)
multipetal <- function(df, n) {
df <- mutate(df, "petal.{n}" := Petal.Width * n) ## problem arises here
df
}
for(i in 2:5) {
iris <- multipetal(df=iris, n=i)
}
iris
I think this comes from dplyr 1.0.0
but not sure (I also have rlang 4.7.0
if it matters).我认为这来自
dplyr 1.0.0
但不确定(如果重要的话,我也有rlang 4.7.0
)。
If you need the same operation several times it usually tells you that your data format is not optimal.如果您多次需要相同的操作,它通常会告诉您您的数据格式不是最佳的。 You want a longer format with
n
being a column in the data.frame that can be achieved by a cross join:您需要更长的格式,其中
n
是 data.frame 中的一列,可以通过交叉连接实现:
library(tidyverse)
iris %>% mutate(identifier = 1:n()) %>% #necessary to disambiguate row 102 from row 143 (complete duplicates)
full_join(tibble(n = 1:5), by=character()) %>% #cross join for long format
mutate(petal = Petal.Width * n) %>% #calculation in long format
pivot_wider(names_from=n, values_from=petal, names_prefix="petal.width.") #back to wider format (if desired)
Result:结果:
# A tibble: 150 x 11
Sepal.Length Sepal.Width Petal.Length Petal.Width Species identifier petal.width.1 petal.width.2 petal.width.3
<dbl> <dbl> <dbl> <dbl> <fct> <int> <dbl> <dbl> <dbl>
1 5.1 3.5 1.4 0.2 setosa 1 0.2 0.4 0.6
2 4.9 3 1.4 0.2 setosa 2 0.2 0.4 0.6
3 4.7 3.2 1.3 0.2 setosa 3 0.2 0.4 0.6
4 4.6 3.1 1.5 0.2 setosa 4 0.2 0.4 0.6
5 5 3.6 1.4 0.2 setosa 5 0.2 0.4 0.6
6 5.4 3.9 1.7 0.4 setosa 6 0.4 0.8 1.2
7 4.6 3.4 1.4 0.3 setosa 7 0.3 0.6 0.9
8 5 3.4 1.5 0.2 setosa 8 0.2 0.4 0.6
9 4.4 2.9 1.4 0.2 setosa 9 0.2 0.4 0.6
10 4.9 3.1 1.5 0.1 setosa 10 0.1 0.2 0.3
# ... with 140 more rows, and 2 more variables: petal.width.4 <dbl>, petal.width.5 <dbl>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.