简体   繁体   中英

some issues with web scraping imd website

So I was scraping this Indian weather website

http://202.54.31.7/citywx/localwx.php

So from the left pane you can see all the Indian states, and if you hover over them you can select the cities/districts. So I chose Delhi->safdarjung from left pane and saved this page locally as:-

from BeautifulSoup import BeautifulSoup
import urllib, urllib2

imd_ind = urllib2.urlopen('http://202.54.31.7/citywx/localwx.php')
delhi_info = imd_ind.read()
open('delhi_info.html', 'w').write(delhi_info)
soup = BeautifulSoup(open('delhi_info.html'))
soup.prettify

print only this much :-

<bound method BeautifulSoup.prettify of <html><head><title>Local Weather Forecast</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="MSHTML 5.00.2920.0" name="GENERATOR" /></head>
<frameset border="0" cols="330,611*" frameborder="NO" framespacing="0" rows="*"><frame name="menuFrame" noresize="noResize" src="menu.php" /><frame name="mainframe" src="http://202.54.31.7/citywx/city_weather1.php?id=42182" /></frameset></html>
>

Whereas if I inspect the locally saved page "delhi_info.html" in chrome, I can see hell lot of information date, temperature, cloudy etc etc (ie lots of , 's ) , but why cant I see them via any of BeautifulSoup methods. Please help

You have frame element in the HTML. You have this code in your saved HTML file:

src="http://202.54.31.7/citywx/city_weather1.php?id=42182"

BeautifulSoup can't scrap this frame, so you need to extract this URL, open it and then scrap the data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM