简体   繁体   中英

How do you subset data from a list in R?

I would LIKE TO EXTRACT THE job descriptions, aka the "p" TAG HTML ELEMENTS, from all 16 pages generated from the last line of code.

"ret" IS A LIST of 16 HTML pages generated by the last line of code. I'm not used to working with lists of lists, so I'm confused how to extract the data from these lists.

Normally I would use

res %>%
html_elements("body p")

But I'm getting the error message, "Error in UseMethod("xml_find_all") : no applicable method for 'xml_find_all' applied to an object of class "list"

library(tidyverse)
library(rvest)
library(xml2)

url<-"https://www.indeed.com/jobs?q=data%20analyst&l=San%20Francisco%2C%20CA&vjk=0c2a6008b4969776"
page<-xml2::read_html(url)#function will read in the code from the webpage and break it down into different elements (<div>, <span>, <p>, etc.

#get job title
title<-page %>%
  html_nodes(".jobTitle") %>%
  html_text()
  
#get company Location
loc<-page %>%
  html_nodes(".companyLocation") %>%
  html_text()

#job snippet
page %>%
  html_nodes(".job-snippet") %>%
  html_text()

#Get link 
desc<- page %>%
  html_nodes("a[data-jk]") %>%
  html_attr("href") 

# Create combine link 
combined_link <- paste("https://www.indeed.com", desc, sep="")

#Turn combined link into a session follow link



page1 <-  html_session(combined_link[[1]])
page1 %>%
  html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>%
  html_text()

#one<- page %>% html_elements("a[id*='job']")

#create function return a list of page-returns

ret <- lapply(paste0("https://www.indeed.com", desc), read_html)

  

We could either use lapply from base R

out <- lapply(ret, function(x) x %>%
         html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>% 
          html_text())

or loop with map from purrr

library(purrr)
out <- map(ret, ~ .x %>% 
         html_nodes(".iCIMS_JobContent, #jobDescriptionText") %>% 
         html_text())

NOTE: Both are looping over the elements of the list , the .x or x are the individual elements (from anonymous function - ie function created on the fly ( function(x) or ~ - in tidyverse )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM