New to web scraping.
I need to get the Daily Observations table (the long table at the end of the page) data from the page:
The html of the table starts from <table _ngcontent-c16="" class="tablesaw-sortable" id="history-observation-table">
My code is:
url = "https://www.wunderground.com/history/daily/us/tx/greenville/KGVT/date/2015-01-05?cm_ven=localwx_history"
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
soup.findAll(class_="region-content-observation")
And the output is:
[<div class="region-content-observation">
<city-history-observation _nghost-c34=""><div _ngcontent-c34="">
<div _ngcontent-c34="" class="observation-title">Daily Observations</div>
<!-- -->
No Data Recorded
<!-- -->
</div></city-history-observation>
</div>]
So it's not getting the table and returned No Data Recorded, but it did get the title.
And When I tried
soup.findAll(class_="tablesaw-sortable")
or
soup.findAll('tr')
it only returned empty list.
Does anyone know where went wrong?
If you open the web page in Firefox, you can use the Network tab from its Developer Tools to see all the different web resources that are downloaded. The data you are interested in is actually provided by this JSON file – which can be retrieved and then parsed using Python's json
library.
Note: I've never scraped a site that uses API keys so I'm not sure about the ethics or best practice in this situation. As a test, I was able to download the JSON file without any problems. However, I suspect Weather Underground wouldn't want you using their key too many times – and it looks like they no longer provide free weather API keys .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.