简体   繁体   中英

Using R to scrape data from a table populated possibly with javascript

Hello fellow R fanatics...

I've been using R to scrape data from a variety of websites for a while now, however this one has me stumped.

I am trying to scrape the data from the following table: http://www.vigimeteo.com/PREV/obs/obs_seul.html?a=07005&b=

However my efforts thus far have failed.

I have tried the following

  1. Simple wget, which results in the html from the site, and some of the javascript functions used to populate the table, but I haven't been able to really look through it and find the parts that I could use to grab the data using some of R's JS utilities. It might be that my experience with JS is quite poor
  2. I tried the solution here Reading data from iframe , b/c it looked like the original website had the table in an iframe, but again no luck
  3. A combination of getURL and readHTMLTable

    thisURL = http://www.vigimeteo.com/PREV/obs/obs_seul.html?a=07005&b= theURL = getURL(thisURL,.opts = list(ssl.verifypeer = FALSE) ) tables = readHTMLTable(theURL)

This results in an empty table

  1. Spent about an hour going through every part of the html and javascript code I could find, but with limited success as detailed in 1.

It appears maybe R's Selenium package could have a potential solution , but I haven't yet figured out how to use it here, probably due to unfamiliarity

I feel like I'm just missing an essential part here... perhaps due to my lack of knowledge of JS and XML?

UPDATE :

I've noticed that if I right-click on the table element and use Chrome's "inspect" it generates HTML that has all of the table's values in it and would be very scrape-able... I'm still not sure how to get to this point in R though. Anyone have hints on where to look in the "inspect" screen to try and guide my progress?

The solution to this was the following.

  1. Using the source code, identify the source html for the table
  2. Navigate to the source page, and use Chrome developer tools > Network > XHR
  3. Refresh the page to find the source of the data
  4. Scrape from that source

Thanks to @XR SC for his answer here: web scraping using Chrome Dev Tools for providing the basic approach.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM