I would LIKE TO EXTRACT THE job descriptions, aka the "p" TAG HTML ELEMENTS, from all 16 pages generated from the last line of code.
"ret" IS A LIST of 16 HTML pages generated by the last line of code. I'm not used to working with lists of lists, so I'm confused how to extract the data from these lists.
Normally I would use
res %>%
html_elements("body p")
But I'm getting the error message, "Error in UseMethod("xml_find_all") : no applicable method for 'xml_find_all' applied to an object of class "list"
library(tidyverse)
library(rvest)
library(xml2)
url<-"https://www.indeed.com/jobs?q=data%20analyst&l=San%20Francisco%2C%20CA&vjk=0c2a6008b4969776"
page<-xml2::read_html(url)#function will read in the code from the webpage and break it down into different elements (<div>, <span>, <p>, etc.
#get job title
title<-page %>%
html_nodes(".jobTitle") %>%
html_text()
#get company Location
loc<-page %>%
html_nodes(".companyLocation") %>%
html_text()
#job snippet
page %>%
html_nodes(".job-snippet") %>%
html_text()
#Get link
desc<- page %>%
html_nodes("a[data-jk]") %>%
html_attr("href")
# Create combine link
combined_link <- paste("https://www.indeed.com", desc, sep="")
#Turn combined link into a session follow link
page1 <- html_session(combined_link[[1]])
page1 %>%
html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>%
html_text()
#one<- page %>% html_elements("a[id*='job']")
#create function return a list of page-returns
ret <- lapply(paste0("https://www.indeed.com", desc), read_html)
We could either use lapply
from base R
out <- lapply(ret, function(x) x %>%
html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>%
html_text())
or loop with map
from purrr
library(purrr)
out <- map(ret, ~ .x %>%
html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>%
html_text())
NOTE: Both are looping over the elements of the list
, the .x
or x
are the individual elements (from anonymous function - ie function created on the fly ( function(x)
or ~
- in tidyverse
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.