简体   繁体   中英

How do I scrape the data from the Google Docs table on this web page?

I'm trying to use Python to scrape the data from the table on this web page.

http://www.dividendyieldhunter.com/exchanged-traded-debt-issues-sorted-alphabetically/

I tried using requests and bs4. I get the raw HTML but it looks like the data is hidden. What should I be trying ?

That particular page is loading the data from a URL in an iFrame in this code:

<iframe id="pageswitcher-content" frameborder="0" marginheight="0" marginwidth="0" src="https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&amp;gid=0" style="display: block; width: 100%; height: 100%;"></iframe>

You would need to further request the HTML from the URL in the src attribute at:

https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&amp;gid=0

Then you could scrape the table with the class="waffle".

NOTE: Take care with the URL query parameters that come from the raw URL as in the example below.

For example the &amp; near the end must be converted to a single & character for the requests module to find the proper URL, eg

import requests
res=requests.get("https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&gid=0")
print(res.text)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM