简体   繁体   English

R 的新本地管道 `|>` 和 magrittr 管道 `%>%` 有什么区别?

[英]What are the differences between R's new native pipe `|>` and the magrittr pipe `%>%`?

In R 4.1 a native pipe operator was introduced that is "more streamlined" than previous implementations.在 R 4.1 中引入了一个本地管道运算符,它比以前的实现“更流线型”。 I already noticed one difference between the native |> and the magrittr pipe %>% , namely 2 %>% sqrt works but 2 |> sqrt doesn't and has to be written as 2 |> sqrt() .我已经注意到原生|>和 magrittr 管道%>%之间的一个区别,即2 %>% sqrt有效,但2 |> sqrt无效,必须写成2 |> sqrt() Are there more differences and pitfalls to be aware of when using the new pipe operator?使用新的管道运算符时是否需要注意更多差异和陷阱?

Another difference between both of them is for the piped in values .它们两者之间的另一个区别是管道输入的值. can be used as a placeholder in magrittr 's pipe可以用作magrittr的 pipe 中的占位符

c("dogs", "cats", "rats") %>% grepl("at", .)
#[1] FALSE  TRUE  TRUE

But this is not possible with R's native pipe.但这对于 R 的原生 pipe 是不可能的。

c("dogs", "cats", "rats") |> grepl("at", .)

Error in grepl(c("dogs", "cats", "rats"), "at", .): object '.' grepl 错误(c("dogs", "cats", "rats"), "at", .): object '.' not found未找到

Here are different ways to reference them -以下是引用它们的不同方法 -

  1. Write a separate function单独写一个function
find_at = function(x) grepl("at", x)
c("dogs", "cats", "rats") |> find_at()
#[1] FALSE  TRUE  TRUE
  1. Use an anonymous function使用匿名 function

    a) Use the "old" syntax a) 使用“旧”语法

    c("dogs", "cats", "rats") |> {function(x) grepl("at", x)}()

    b) Use the new anonymous function syntax b) 使用的匿名 function 语法

    c("dogs", "cats", "rats") |> {\(x) grepl("at", x)}()
  2. Specify the first parameter by name.按名称指定第一个参数。 This relies on the fact that the native pipe pipes into the first unnamed parameter, so if you provide a name for the first parameter it "overflows" into the second (and so on if you specify more than one parameter by name)这取决于本机 pipe 管道进入第一个未命名参数的事实,因此如果您为第一个参数提供名称,它会“溢出”到第二个参数(如果您按名称指定多个参数,依此类推)

c("dogs", "cats", "rats") |> grepl(pattern="at")
#> [1] FALSE  TRUE  TRUE

The base R pipe |> added in R 4.1.0 "just" does functional composition.基础 R pipe |>添加到 R 4.1.0 中“只是”进行功能组合。 Ie we can see that its use really is just the same as the functional call:即我们可以看到它的使用真的和函数调用一样:

> 1:5 |> sum()             # simple use of |>
[1] 15
> deparse(substitute( 1:5 |> sum() ))
[1] "sum(1:5)"
> 

That has some consequences:这有一些后果:

  • it makes it a little faster它使它更快一点
  • it makes it a little simpler and more robust它使它更简单,更健壮
  • it makes is a little more restrictive: sum() here needs the parens for a proper call它使限制性更强:这里的sum()需要括号才能正确调用
  • it limits uses of the 'implicit' data argument它限制了“隐式”数据参数的使用

This leads to possible use of => which is currently "available but not active" (for which you need to set the enviornment variable _R_USE_PIPEBIND_ , and which may change for R 4.2.0).这导致可能使用当前“可用但未激活”的=> (您需要为此设置环境变量_R_USE_PIPEBIND_ ,并且对于 R 4.2.0 可能会更改)。

(This was first offered as answer to a question duplicating this over here and I just copied it over as suggested.) (这首先是作为在此处复制此问题的问题的答案而提供的,我只是按照建议将其复制了。)

Edit: As the follow-up question on 'what is => ' comes up, here is a quick follow-up.编辑:随着关于“什么是=> ”的后续问题出现,这里有一个快速跟进。 Note that this operator is subject to change.请注意,此运算符可能会发生变化。

> Sys.setenv("_R_USE_PIPEBIND_"=TRUE)
> mtcars |> subset(cyl == 4) |> d => lm(mpg ~ disp, data = d)

Call:
lm(formula = mpg ~ disp, data = subset(mtcars, cyl == 4))

Coefficients:
(Intercept)         disp  
     40.872       -0.135  

> deparse(substitute(mtcars |> subset(cyl==4) |> d => lm(mpg ~ disp, data = d)))
[1] "lm(mpg ~ disp, data = subset(mtcars, cyl == 4))"
> 

The deparse(substitute(...)) is particularly nice here. deparse(substitute(...))在这里特别好。

The native pipe is implemented as a syntax transformation and so 2 |> sqrt() has no discernible overhead compared to sqrt(2) , whereas 2 %>% sqrt() comes with a small penalty.本机 pipe 是作为语法转换实现的,因此2 |> sqrt()sqrt(2)相比没有明显的开销,而2 %>% sqrt()有一个小的惩罚。

microbenchmark(sqrt(1), 
               2 |> sqrt(), 
               3 %>% sqrt())
# Unit: nanoseconds
#          expr  min     lq    mean median   uq   max neval
#       sqrt(1)  117  126.5  141.66  132.0  139   246   100
#       sqrt(2)  118  129.0  156.16  134.0  145  1792   100
#  3 %>% sqrt() 2695 2762.5 2945.26 2811.5 2855 13736   100

You see how the expression 2 |> sqrt() passed to microbenchmark is parsed as sqrt(2) .您会看到传递给microbenchmark的表达式2 |> sqrt()是如何被解析为sqrt(2)的。 This can also be seen in这也可以在

quote(2 |> sqrt())
# sqrt(2)
Topic话题 Magrittr 2.0.3马格利特2.0.3 Base 4.2.0基础4.2.0
Operator操作员 %>% |>
Function call Function 来电 %>% sum() |> sum()
%>% sum Needs brackets需要括号
%>% `$`(cyl) Some functions are not supported不支持某些功能
Placeholder占位符 . _
%>% lm(mpg ~ disp, data =. ) |> lm(mpg ~ disp, data = _ )
%>% lm(mpg ~ disp, . ) Needs named argument需要命名参数
%>% setNames(., .) Can only appear once只能出现一次
%>% {sum(sqrt(.))} Nested calls are not allowed不允许嵌套调用
Environment环境 Additional function environement附加 function 环境 "x" |> assign(1)
Speed速度 Overhead of function call function 调用的开销 Syntax transformation语法转换

Many differences and limitations disappear when using |> in combination with an (anonymous) function: 1 |> (\(.).)() , -3:3 |> (\(.) sum(2*abs(.) - 3*.^2))()|>与(匿名)function: 1 |> (\(.).)() , -3:3 |> (\(.) sum(2*abs(.) - 3*.^2))()


Needs brackets需要括号

library(magrittr)

1:3 |> sum
#Error: The pipe operator requires a function call as RHS

1:3 |> sum()
#[1] 6

1:3 %>% sum
#[1] 6

1:3 %>% sum()
#[1] 6

Some functions are not supported , but some still can be called by placing them in brackets, call them via the function :: , call it in a function or define a link to the function.支持某些函数,但仍然可以通过将它们放在括号中来调用它们,通过 function ::调用它们,在 function 中调用它或定义到 ZC1C425268E68385D1AB5074F 的链接。

mtcars |> `$`(cyl)
#Error: function '$' not supported in RHS call of a pipe

mtcars |> (`$`)(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> base::`$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars |> (\(.) .$cyl)()
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

fun <- `$`
mtcars |> fun(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

mtcars %>% `$`(cyl)
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

Placeholder needs named argument占位符需要命名参数

2 |> setdiff(1:3, _)
#Error: pipe placeholder can only be used as a named argument

2 |> setdiff(1:3, y = _)
#[1] 1 3

2 |> (\(.) setdiff(1:3, .))()
#[1] 1 3

2 %>% setdiff(1:3, .)
#[1] 1 3

2 %>% setdiff(1:3, y = .)
#[1] 1 3

Placeholder can only appear once占位符只能出现一次

1:3 |> setNames(object = _, nm = _)
#Error in setNames(object = "_", nm = "_") : 
#  pipe placeholder may only appear once

1:3 |> (\(.) setNames(., .))()
#1 2 3 
#1 2 3 

1:3 |> list() |> setNames(".") |> with(setNames(., .))
#1 2 3 
#1 2 3 

1:3 %>% setNames(object = ., nm = .)
#1 2 3
#1 2 3

1:3 %>% setNames(., .)
#1 2 3 
#1 2 3

Nested calls are not allowed不允许嵌套调用

1:3 |> sum(sqrt(x=_))
#Error in sum(1:3, sqrt(x = "_")) : invalid use of pipe placeholder

1:3 |> (\(.) sum(sqrt(.)))()
#[1] 4.146264

1:3 %>% {sum(sqrt(.))}
#[1] 4.146264

No additional Environment没有额外的环境

assign("x", 1)
x
#[1] 1

"x" |> assign(2)
x
#[1] 2

"x" |> (\(x) assign(x, 3))()
x
#[1] 2

"x" %>% assign(4)
x
#[1] 2

Other possibilities:其他可能性:
A different pipe operator and different placeholder could be realized with the Bizarro pipe ->.;使用 Bizarro pipe ->.;可以实现不同的 pipe 运算符和不同的占位符。 what is not a pipe (see disadvantages ) which is overwriting .什么不是正在覆盖的 pipe(请参阅缺点.

1:3 ->.; sum(.)
#[1] 6

mtcars ->.; .$cyl
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

1:3 ->.; setNames(., .)
#1 2 3 
#1 2 3 

1:3 ->.; sum(sqrt(x=.))
#[1] 4.146264

"x" ->.; assign(., 5)
x
#[1] 5

and evaluates different.并且评价不同。

x <- data.frame(a=0)
f1 <- \(x) {message("IN 1"); x$b <- 1; message("OUT 1"); x}
f2 <- \(x) {message("IN 2"); x$c <- 2; message("OUT 2"); x}

x ->.; f1(.) ->.; f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

x |> f1() |> f2()
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

f2(f1(x))
#IN 2
#IN 1
#OUT 1
#OUT 2
#  a b c
#1 0 1 2

Or define an own operator, which evaluates different.或者定义一个自己的运算符,它评估不同。

":=" <- function(lhs, rhs) {
  e <- exists(".", parent.frame(), inherits = FALSE)
  . <- get0(".", envir = parent.frame(), inherits = FALSE)
  assign(".", lhs, envir=parent.frame())
  on.exit(if(identical(lhs, get0(".", envir = parent.frame(), inherits = FALSE))) {
            if(e) {
              assign(".", ., envir=parent.frame())
            } else {
              if(exists(".", parent.frame())) rm(., envir = parent.frame())
            }
          })
  eval(substitute(rhs), parent.frame())
}

. <- 0
"." := assign(., 1)
.
#[1] 1

1:3 := sum(.)
#[1] 6
.
#[1] 1

mtcars := .$cyl
# [1] 6 6 4 6 8 6 8 4 4 6 6 8 8 8 8 8 8 4 4 4 4 8 8 8 8 4 4 4 8 6 8 4

1:3 := setNames(., .)
#1 2 3 
#1 2 3 

1:3 := sum(sqrt(x=.))
#[1] 4.146264

"x" := assign(., 6)
x
#[1] 6

1 := .+1 := .+2
#[1] 4

x <- data.frame(a=0)
x := f1(.) := f2(.)
#IN 1
#OUT 1
#IN 2
#OUT 2
#  a b c
#1 0 1 2

Speed速度

library(magrittr)

":=" <- function(lhs, rhs) {
  e <- exists(".", parent.frame(), inherits = FALSE)
  . <- get0(".", envir = parent.frame(), inherits = FALSE)
  assign(".", lhs, envir=parent.frame())
  on.exit(if(identical(lhs, get0(".", envir = parent.frame(), inherits = FALSE))) {
            if(e) {
              assign(".", ., envir=parent.frame())
            } else {
              if(exists(".", parent.frame())) rm(., envir = parent.frame())
            }
          })
  eval(substitute(rhs), parent.frame())
}

`%|%` <- function(lhs, rhs) {  #Overwrite and keep .
    assign(".", lhs, envir=parent.frame())
    eval(substitute(rhs), parent.frame())
}

x <- 42
bench::mark(min_time = 0.2, max_iterations = 1e8
, x
, identity(x)
, "|>" = x |> identity()
, "|> _" = x |> identity(x=_)
, "|> f()" = x |> (\(y) identity(y))()
, "%>%" = x %>% identity
, "->.;" = {x ->.; identity(.)}
, ":=" = x := identity(.)
, "%|%" = x %|% identity(.)
, "list." = x |> list() |> setNames(".") |> with(identity(.))
)

Result结果

#   expression       min   median `itr/sec` mem_alloc `gc/sec`   n_itr  n_gc
#   <bch:expr>  <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>   <int> <dbl>
# 1 x             9.89ns  10.94ns 66611556.        0B     11.7 5708404     1
# 2 identity(x) 179.98ns 200.12ns  4272195.        0B     49.6  603146     7
# 3 |>          179.98ns 201.05ns  4238021.        0B     41.1  722534     7
# 4 |> _        189.87ns 219.91ns  4067314.        0B     39.4  722803     7
# 5 |> f()      410.01ns 451.11ns  1889295.        0B     44.6  339126     8
# 6 %>%           1.27µs   1.39µs   632255.    5.15KB     43.2  117210     8
# 7 ->.;        289.87ns 330.97ns  2581693.        0B     27.0  477389     5
# 8 :=            6.46µs   7.12µs   131921.        0B     48.8   24330     9
# 9 %|%           2.05µs   2.32µs   394515.        0B     43.2   73094     8
#10 list.         2.42µs   2.74µs   340220.     8.3KB     42.3   64324     8

One difference is their placeholder, _ in base R, .一个区别是它们的占位符_在基础 R, 中. in magrittr .magrittr


Since R 4.2.0 , the base R pipe has a placeholder for piped-in values, _ , similar to %>% 's .由于R 4.2.0 ,基础 R pipe 有一个用于管道输入值的占位符_ ,类似于%>% . , but its use is restricted to named arguments, and can only be used once per call. ,但其使用仅限于命名为 arguments,并且每次调用只能使用一次。

It is now possible to use a named argument with the placeholder _ in the rhs call to specify where the lhs is to be inserted.现在可以在 rhs 调用中使用带有占位符 _ 的命名参数来指定要插入 lhs 的位置。 The placeholder can only appear once on the rhs.占位符只能在 rhs 上出现一次。

To reiterate Ronak Shah 's example, you can now use _ as a named argument on the right-hand side to refer to the left-hand side of the formula:重申Ronak Shah的示例,您现在可以使用_作为右侧的命名参数来引用公式的左侧:

c("dogs", "cats", "rats") |> 
    grepl("at", x = _)
#[1] FALSE  TRUE  TRUE

but it has to be named:但它必须命名为:

c("dogs", "cats", "rats") |> 
    grepl("at", _)
#Error: pipe placeholder can only be used as a named argument

and cannot appear more than once (to overcome this issue, one can still use the solutions provided by Ronak Shah ):并且不能出现多次(为了克服这个问题,仍然可以使用Ronak Shah提供的解决方案):

c("dogs", "cats", "rats") |> 
  expand.grid(x = _, y = _)
# Error in expand.grid(x = "_", y = "_") : pipe placeholder may only appear once

While this is possible with magrittr :虽然这可以通过magrittr

library(magrittr)
c("dogs", "cats", "rats") %>% 
  expand.grid(x = ., y = .)
#     x    y
#1 dogs dogs
#2 cats dogs
#3 rats dogs
#4 dogs cats
#5 cats cats
#6 rats cats
#7 dogs rats
#8 cats rats
#9 rats rats

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM