简体   繁体   中英

BeautifulSoup returns No Data Recorded when getting table from web

New to web scraping.

I need to get the Daily Observations table (the long table at the end of the page) data from the page:

https://www.wunderground.com/history/daily/us/tx/greenville/KGVT/date/2015-01-05?cm_ven=localwx_history

The html of the table starts from <table _ngcontent-c16="" class="tablesaw-sortable" id="history-observation-table">

My code is:

url = "https://www.wunderground.com/history/daily/us/tx/greenville/KGVT/date/2015-01-05?cm_ven=localwx_history"
html = urlopen(url)
soup = BeautifulSoup(html,'lxml')
soup.findAll(class_="region-content-observation")

And the output is:

[<div class="region-content-observation">
 <city-history-observation _nghost-c34=""><div _ngcontent-c34="">
 <div _ngcontent-c34="" class="observation-title">Daily Observations</div>
 <!-- -->
     No Data Recorded

   <!-- -->
 </div></city-history-observation>
 </div>]

So it's not getting the table and returned No Data Recorded, but it did get the title.

And When I tried

soup.findAll(class_="tablesaw-sortable")

or

soup.findAll('tr')

it only returned empty list.

Does anyone know where went wrong?

If you open the web page in Firefox, you can use the Network tab from its Developer Tools to see all the different web resources that are downloaded. The data you are interested in is actually provided by this JSON file – which can be retrieved and then parsed using Python's json library.

Note: I've never scraped a site that uses API keys so I'm not sure about the ethics or best practice in this situation. As a test, I was able to download the JSON file without any problems. However, I suspect Weather Underground wouldn't want you using their key too many times – and it looks like they no longer provide free weather API keys .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM