简体繁体中英

Using R to scrape data from a table populated possibly with javascript

原文 2019-03-06 04:38:13 3 1 javascript/ r/ web-scraping

Hello fellow R fanatics...

I've been using R to scrape data from a variety of websites for a while now, however this one has me stumped.

I am trying to scrape the data from the following table: http://www.vigimeteo.com/PREV/obs/obs_seul.html?a=07005&b=

However my efforts thus far have failed.

I have tried the following

Simple wget, which results in the html from the site, and some of the javascript functions used to populate the table, but I haven't been able to really look through it and find the parts that I could use to grab the data using some of R's JS utilities. It might be that my experience with JS is quite poor
I tried the solution here Reading data from iframe , b/c it looked like the original website had the table in an iframe, but again no luck
A combination of getURL and readHTMLTable
thisURL = http://www.vigimeteo.com/PREV/obs/obs_seul.html?a=07005&b= theURL = getURL(thisURL,.opts = list(ssl.verifypeer = FALSE) ) tables = readHTMLTable(theURL)

This results in an empty table

Spent about an hour going through every part of the html and javascript code I could find, but with limited success as detailed in 1.

It appears maybe R's Selenium package could have a potential solution , but I haven't yet figured out how to use it here, probably due to unfamiliarity

I feel like I'm just missing an essential part here... perhaps due to my lack of knowledge of JS and XML?

UPDATE :

I've noticed that if I right-click on the table element and use Chrome's "inspect" it generates HTML that has all of the table's values in it and would be very scrape-able... I'm still not sure how to get to this point in R though. Anyone have hints on where to look in the "inspect" screen to try and guide my progress?

1 answers

The solution to this was the following.

Using the source code, identify the source html for the table
Navigate to the source page, and use Chrome developer tools > Network > XHR
Refresh the page to find the source of the data
Scrape from that source

Thanks to @XR SC for his answer here: web scraping using Chrome Dev Tools for providing the basic approach.

How to scrape a table that is populated by javascript?

Scrape data using R from Javascript pop-up window

How to scrape javascript table in R?

Scrape data from a table in R, cant find the data inside

jQuery Datatable functionality not working when table data is populated by using Javascript

How do I pull data from a javascript populated table?

Scrape a page with JavaScript from R

How to use Python (preferably pandas) to scrape data from Javascript table?

R scrape data from Highcharts

How to fetch data from a website using Python that is being populated by Javascript?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to scrape a table that is populated by javascript? Scrape data using R from Javascript pop-up window How to scrape javascript table in R? Scrape data from a table in R, cant find the data inside jQuery Datatable functionality not working when table data is populated by using Javascript How do I pull data from a javascript populated table? Scrape a page with JavaScript from R How to use Python (preferably pandas) to scrape data from Javascript table? R scrape data from Highcharts How to fetch data from a website using Python that is being populated by Javascript?

Related Tags

Using R to scrape data from a table populated possibly with javascript

Question

1 answers

solution1 0 ACCPTED 2019-03-10 19:24:53

solution1
0 ACCPTED 2019-03-10 19:24:53