简体   繁体   中英

dplyr .data pronoun vs “quosure” approach

In dplyr v0.7.0 , the .data pronoun was introduced that allowed us to refer to variables with strings. I was just curious as to whether this approach was preferred over the "quosure" approach. For example, here is an approach that uses the .data pronoun:

varname <- "gear"
data_pronoun_method_df <- dplyr::mutate(mtcars, new_col = .data[[varname]] + 2)

This is compared to an example using the quosure approach:

quo_varname <- rlang::quo(gear)
quo_method_df <- dplyr::mutate(mtcars, new_col = !! quo_varname + 2)

Both methods produce the same output:

data_pronoun_method_df

# mpg cyl  disp  hp drat    wt  qsec vs am gear carb new_col
# 1  21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4       6
# 2  21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4       6
# 3  22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1       6
# 4  21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1       5
# 5  18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2       5
# 6  18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1       5
# 7  14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4       5
# 8  24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2       6
# [ reached getOption("max.print") -- omitted 24 rows ]

all.equal(data_pronoun_method_df, quo_method_df)
# [1] TRUE

Is there any real difference? What are the advantages and disadvantages of either method?

The .data pronoun can be useful to work around NSE but it is more or less orthogonal to tidy eval. Its main purpose is to make sure the variable will be looked up in the data frame. If it doesn't exist you get an error. This is in contrast to bare names that could pick up local objects if they are defined:

other <- 1e10
transmute(mtcars, 2 * other)            # Succeeds erroneously
transmute(mtcars, 2 * .data[["other"]]  # Fails

Using the .data pronoun is more reliable than just referring to the data frame explicitly because the data might be grouped:

group_by(mtcars, cyl) %>%
  transmute(2L * .data[["am"]])

In that example .data[["am"]] represents slices of the am column defined by the levels of cyl .

Edit : For completeness, you can accomplish the same thing with quosures and quasiquotation. If you create a quosure to a symbol with the empty env as environment, the symbol lookup will only succeed if the data frame contains such a column:

other <- 1e10
quo <- new_quosure(quote(other), empty_env())
transmute(mtcars, 2L * !!quo)  # Fails

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM