在管道中的同一对象上调用两个不同的函数 (%>%)

Question

我想知道是否有办法同时调用html_name()和html_text （来自rvest包）并将两个不同的结果存储在同一管道内（ magrittr::%>% ）

下面是一个例子：

uniprot_ac <- "P31374"

GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
    content(as = "raw", content = "text/xml") %>%
    read_html %>%
    html_nodes(xpath = '//recommendedname/* |
               //name[@type="primary"] | //comment[@type="function"]/text |
               //comment[@type="interaction"]/text')

此时我想从html_name()获取两个标签名称

[1] "fullname" "ecnumber" "name"     "text"

AND 标签内容无需通过重写整个管道来创建单独的对象，只需将最后一行更改为html_text()

[1] "Serine/threonine-protein kinase PSK1"                                                                                                                                                                                                                                                                             
[2] "2.7.11.1"                                                                                                                                                                                                                                                                                                         
[3] "PSK1"                                                                                                                                                                                                                                                                                                             
[4] "Serine/threonine-protein kinase involved ... ...

所需的输出可以是这样的，向量或数据帧都无关紧要

  [1] fullname: "Serine/threonine-protein kinase PSK1"                                                                                                                                                                                                                                                                             
  [2] ecnumber: "2.7.11.1"                                                                                                                                                                                                                                                                                                         
  [3] Name: "PSK1"                                                                                                                                                                                                                                                                                                             
  [4] Text: "Serine/threonine-protein kinase involved ... ...

Answer 1

也许有点黑客，但您可以在管道中使用带括号的匿名函数：

library("magrittr")
library("httr")
library("xml2")
library("rvest")

uniprot_ac <- "P31374"

GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
  content(as = "raw", content = "text/xml") %>%
  read_html %>%
  html_nodes(xpath = '//recommendedname/* |
             //name[@type="primary"] | //comment[@type="function"]/text |
             //comment[@type="interaction"]/text') %>% 
  (function(x) list(name = html_name(x), text = html_text(x)))
#$name
#[1] "fullname" "ecnumber" "name"     "text"    
#
#$text
#[1] "Serine/threonine-protein kinase PSK1"                                                                                                                                                                                                                                                                             
#[2] "2.7.11.1"                                                                                                                                                                                                                                                                                                         
#[3] "PSK1"                                                                                                                                                                                                                                                                                                             
#[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."

或者，您可以使用purrr包做一些更优雅的purrr ，但我看不出为什么要加载整个包来做到这一点。

编辑正如@MrFlick 在评论中所指出的，如果正确地放在大括号中，点 ( . ) 占位符可以做同样的事情。

GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
  content(as = "raw", content = "text/xml") %>%
  read_html %>%
  html_nodes(xpath = '//recommendedname/* |
             //name[@type="primary"] | //comment[@type="function"]/text |
             //comment[@type="interaction"]/text') %>% 
  {list(name = html_name(.), text = html_text(.))}

这无疑是做的更magrittr-习惯的方法，并且在实际记录help("%>%")

Answer 2

您可以创建一个自定义函数来接收html_nodes对象并对其执行任何所需的操作：

html_name_text <- function(nodes) {
    list(html_name(nodes), html_text(nodes))
}

GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
    content(as = "raw", content = "text/xml") %>%
    read_html %>%
    html_nodes(xpath = '//recommendedname/* |
               //name[@type="primary"] | //comment[@type="function"]/text |
               //comment[@type="interaction"]/text') %>%
    html_name_text()

[[1]]
[1] "fullname" "ecnumber" "name"     "text"    

[[2]]
[1] "Serine/threonine-protein kinase PSK1"                                                                                                                                                                                                                                                                             
[2] "2.7.11.1"                                                                                                                                                                                                                                                                                                         
[3] "PSK1"                                                                                                                                                                                                                                                                                                             
[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."

Answer 3

这是一个purrr方法，它返回一个tibble ：

library(tidyverse)
library(rvest)

uniprot_ac <- "P31374"
read_html(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
  html_nodes(xpath = '//recommendedname/* |
               //name[@type="primary"] | //comment[@type="function"]/text |
               //comment[@type="interaction"]/text') %>% 
  map(~ list(name = html_name(.), text = html_text(.))) %>%
  bind_rows
#> # A tibble: 4 x 2
#>   name     text                                                            
#>   <chr>    <chr>                                                           
#> 1 fullname Serine/threonine-protein kinase PSK1                            
#> 2 ecnumber 2.7.11.1                                                        
#> 3 name     PSK1                                                            
#> 4 text     Serine/threonine-protein kinase involved in the control of suga~

^{由reprex 包(v0.2.1) 于 2019 年 3 月 26 日创建}

Answer 4

一种选择是在管道后使用括号，将当前结果存储在临时对象中（如果需要），然后计算您想要的不同结果：

GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
    content(as = "raw", content = "text/xml") %>%
    read_html %>%
    html_nodes(xpath = '//recommendedname/* |
               //name[@type="primary"] | //comment[@type="function"]/text |
               //comment[@type="interaction"]/text') %>% {
    list(name = html_name(.), text = html_text(.))
    }

仅供参考，有时您需要通过一个临时对象，如本例所示：

iris %>% 
  select(Sepal.Length, Sepal.Width) %>% {
     temp <- .
     bind_rows(temp %>% filter(Sepal.Length > 5), 
               temp %>% filter(Sepal.Width <= 3))
} %>% 
  dim()

在这种情况下，如果您将temp替换为. 直接，它不会工作。

Answer 5

没有额外的包，也没有太多的括号和点，你可以这样做：

nodes %>% lapply(list(html_name, html_text), function(x,y) x(y), .)
# [[1]]
# [1] "fullname" "ecnumber" "name"     "text"    
# 
# [[2]]
# [1] "Serine/threonine-protein kinase PSK1"                                                                                                                                                                                                                                                                             
# [2] "2.7.11.1"                                                                                                                                                                                                                                                                                                         
# [3] "PSK1"                                                                                                                                                                                                                                                                                                             
# [4] "Serine/threonine-protein kinase involved in the control of sugar

或以下，稍微紧凑但带括号：

nodes %>% {lapply(list(html_name, html_text), do.call, list(.))}

我会用purrr虽然环上的功能，并通过这些功能来exec沿. 作为参数：

library(purrr)
nodes %>% map(list(html_name, html_text), exec, .)

（相同的输出）

数据

library("magrittr")
library("httr")
library("xml2")
library("rvest")
nodes <- GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
  content(as = "raw", content = "text/xml") %>%
  read_html %>%
  html_nodes(xpath = '//recommendedname/* |
             //name[@type="primary"] | //comment[@type="function"]/text |
             //comment[@type="interaction"]/text')

在管道中的同一对象上调用两个不同的函数 (%>%)

问题描述

5 个解决方案

解决方案1
5 已采纳 2019-03-26 19:08:14

解决方案2
4 2019-03-26 19:09:33

解决方案3
4 2019-03-26 19:22:34

解决方案4
3 2019-03-26 19:23:18

解决方案5
3 2019-03-27 10:50:48

在管道中的同一对象上调用两个不同的函数 (%&gt;%)

问题描述

5 个解决方案

解决方案1 5 已采纳 2019-03-26 19:08:14

解决方案2 4 2019-03-26 19:09:33

解决方案3 4 2019-03-26 19:22:34

解决方案4 3 2019-03-26 19:23:18

解决方案5 3 2019-03-27 10:50:48

在管道中的同一对象上调用两个不同的函数 (%>%)

解决方案1
5 已采纳 2019-03-26 19:08:14

解决方案2
4 2019-03-26 19:09:33

解决方案3
4 2019-03-26 19:22:34

解决方案4
3 2019-03-26 19:23:18

解决方案5
3 2019-03-27 10:50:48