So I was scraping this Indian weather website
http://202.54.31.7/citywx/localwx.php
So from the left pane you can see all the Indian states, and if you hover over them you can select the cities/districts. So I chose Delhi->safdarjung
from left pane and saved this page locally as:-
from BeautifulSoup import BeautifulSoup
import urllib, urllib2
imd_ind = urllib2.urlopen('http://202.54.31.7/citywx/localwx.php')
delhi_info = imd_ind.read()
open('delhi_info.html', 'w').write(delhi_info)
soup = BeautifulSoup(open('delhi_info.html'))
soup.prettify
print only this much :-
<bound method BeautifulSoup.prettify of <html><head><title>Local Weather Forecast</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="MSHTML 5.00.2920.0" name="GENERATOR" /></head>
<frameset border="0" cols="330,611*" frameborder="NO" framespacing="0" rows="*"><frame name="menuFrame" noresize="noResize" src="menu.php" /><frame name="mainframe" src="http://202.54.31.7/citywx/city_weather1.php?id=42182" /></frameset></html>
>
Whereas if I inspect the locally saved page "delhi_info.html" in chrome, I can see hell lot of information date, temperature, cloudy etc etc (ie lots of , 's ) , but why cant I see them via any of BeautifulSoup methods. Please help
You have frame element in the HTML. You have this code in your saved HTML file:
src="http://202.54.31.7/citywx/city_weather1.php?id=42182"
BeautifulSoup can't scrap this frame, so you need to extract this URL, open it and then scrap the data.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.