简体   繁体   中英

web scraping a table with R

i am trying to web scrape a table from pitch book web site . But using simple HTML does not work because pitch book uses java script instead of HTML to load the data so i need execute the JS in order to extract the info from the json file . this is my code :

    library(httr)
    library(jsonlite)
    library(magrittr)  
    json=get("https://my.pitchbook.com/old/ 
    homeContent.64ea0536fd321cc1dd3b.js") %>% 
    content(as='text') %>% 
    fromJSON()

i get this error :

    Error in 
   get("https://my.pitchbook.com/old/homeContent.64ea0536fd321cc1dd3b.js") 
    : 
     object 
  'https://my.pitchbook.com/old/homeContent.64ea0536fd321cc1dd3b.js'
   not found

what ever data i am trying to load it returns the same error . would appreciate your help :) thank you :)

You have called base::get and not httr::GET . So it should be

library(httr)
library(jsonlite)
library(magrittr)  
json <- GET(
  "https://my.pitchbook.com/old/homeContent.64ea0536fd321cc1dd3b.js"
) %>% 
  content("text") %>% 
  fromJSON()

but I'm not entirely sure that your website url gives a valid json. This in itself will give

lexical error: invalid char in json text.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM