从 R 中的网络抓取中提取数据的效率

Question

This is no doubt very simple so apologies but I am new to webscraping and am trying to extract multiple datapoints in one call using rvest.毫无疑问，这非常简单，所以很抱歉，但我是网络抓取的新手，我正在尝试使用 rvest 在一次调用中提取多个数据点。 Let's take for example the following code (NB I have not used the actual website which I have replaced in this code snippet with xxxxxx.com):让我们以下面的代码为例（注意我没有使用我在这个代码片段中用 xxxxxx.com 替换的实际网站）：

univsalaries <- lapply(paste0('https://xxxxxx.com/job/p', 1:20,'/key=%F9%80%76&final=1&jump=1&PGTID=0d3408-0000-24gf-ac2b-810&ClickID=2'),
                   function(url_base){
                     url_base %>% read_html() %>% 
                       html_nodes('.salary') %>% 
                       html_text()
                   })

Answer 1

Read the webpage once and then you can extract multiple values from the same page.阅读网页一次，然后您可以从同一页面中提取多个值。

library(purrr)
library(rvest)

univsalaries <- map(paste0('https://xxxxxx.com/job/p', 1:20,'/key=%F9%80%76&final=1&jump=1&PGTID=0d3408-0000-24gf-ac2b-810&ClickID=2'),
                       function(url_base){
                         webpage <- url_base %>% read_html() 
                         data.frame(Salary = webpage %>% html_nodes('.salary') %>% html_text(), 
                                    Company = webpage %>% html_nodes('.company') %>% html_text())
                       })

从 R 中的网络抓取中提取数据的效率

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-02-07 01:36:00

从 R 中的网络抓取中提取数据的效率

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-02-07 01:36:00

解决方案1
0 已采纳 2022-02-07 01:36:00