How do I scrape the data from the Google Docs table on this web page?

Question

I'm trying to use Python to scrape the data from the table on this web page.

http://www.dividendyieldhunter.com/exchanged-traded-debt-issues-sorted-alphabetically/

I tried using requests and bs4. I get the raw HTML but it looks like the data is hidden. What should I be trying ?

Answer 1

That particular page is loading the data from a URL in an iFrame in this code:

<iframe id="pageswitcher-content" frameborder="0" marginheight="0" marginwidth="0" src="https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&amp;gid=0" style="display: block; width: 100%; height: 100%;"></iframe>

You would need to further request the HTML from the URL in the src attribute at:

https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&amp;gid=0

Then you could scrape the table with the class="waffle".

NOTE: Take care with the URL query parameters that come from the raw URL as in the example below.

For example the & near the end must be converted to a single & character for the requests module to find the proper URL, eg

import requests
res=requests.get("https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&gid=0")
print(res.text)

How do I scrape the data from the Google Docs table on this web page?

Question

1 answers

solution1
1 ACCPTED 2016-12-19 04:06:39

How do I scrape the data from the Google Docs table on this web page?

Question

1 answers

solution1 1 ACCPTED 2016-12-19 04:06:39

solution1
1 ACCPTED 2016-12-19 04:06:39