I'm trying to use Python to scrape the data from the table on this web page.
http://www.dividendyieldhunter.com/exchanged-traded-debt-issues-sorted-alphabetically/
I tried using requests and bs4. I get the raw HTML but it looks like the data is hidden. What should I be trying ?
That particular page is loading the data from a URL in an iFrame in this code:
<iframe id="pageswitcher-content" frameborder="0" marginheight="0" marginwidth="0" src="https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&gid=0" style="display: block; width: 100%; height: 100%;"></iframe>
You would need to further request the HTML from the URL in the src attribute at:
https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&gid=0
Then you could scrape the table with the class="waffle".
NOTE: Take care with the URL query parameters that come from the raw URL as in the example below.
For example the &
near the end must be converted to a single & character for the requests module to find the proper URL, eg
import requests
res=requests.get("https://docs.google.com/spreadsheets/d/1_HY2XEBKcyi4STki-uUbOfr-su8CZOfpi-jM1Racwyw/pubhtml/sheet?headers=false&gid=0")
print(res.text)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.