[英]Passing column name inside a function using dplyr
我知道在 function 中使用lazyeval以使用dplyr引用列名,但被卡住了。 In general, when creating a function that uses dplyr which also references column names from function arguments, what is the most idiomatic way to achieve that? 謝謝。
library(lazyeval)
## Create data frame
df0 <- data.frame(x=rnorm(100), y=runif(100))
##########################################
## Sample mean; this way works
##########################################
df0 %>%
filter(!is.na(x)) %>%
summarize(mean=mean(x))
##########################################
## Sample mean via function; does not work
##########################################
dfSummary2 <- function(df, var_y) {
p <- df %>%
filter(!is.na(as.name(var_y))) %>%
summarize(mean=mean(as.name(var_y)))
return(p)
}
dfSummary(df0, "x")
# mean
# 1 NA
# Warning message:
# In mean.default("x") : argument is not numeric or logical: returning NA
##########################################
## Sample mean via function; also does not work
##########################################
dfSummary <- function(df, var_y) {
p <- df %>%
filter(!is.na(var_y)) %>%
summarize(mean=mean(var_y))
return(p)
}
dfSummary(df0, "x")
# mean
# 1 NA
# Warning message:
# In mean.default("x") : argument is not numeric or logical: returning NA
如果使用dplyr
,則使用summarize_
和filter_
的注釋是正確的方向,更多信息可通過vignette("nse")
獲得。
盡管存在給定的問題,但這將提供一個 function 使用變量列名而不需要dplyr
dfSummary <- function(df, var_y) {
mean(df[[var_y]], na.rm = TRUE)
}
dfSummary(df0, "x")
[1] 0.105659
dfSummary(df0, "y")
[1] 0.4948618
filter_
summarize_
已被棄用以獲取信息。 最好用
dfSummary <- function(df, var_y) {
p <- df %>%
filter(!is.na(var_y)) %>%
summarize(mean=mean({{var_y}}))
return(p)
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.