简体   繁体   English

网站抓取imd网站的一些问题

[英]some issues with web scraping imd website

So I was scraping this Indian weather website 所以我在刮这个印度天气网站

http://202.54.31.7/citywx/localwx.php

So from the left pane you can see all the Indian states, and if you hover over them you can select the cities/districts. 因此,在左窗格中,您可以看到所有印度州,如果将鼠标悬停在它们上方,则可以选择城市/地区。 So I chose Delhi->safdarjung from left pane and saved this page locally as:- 所以我从左窗格中选择了Delhi->safdarjung ,并将此页面本地保存为:-

from BeautifulSoup import BeautifulSoup
import urllib, urllib2

imd_ind = urllib2.urlopen('http://202.54.31.7/citywx/localwx.php')
delhi_info = imd_ind.read()
open('delhi_info.html', 'w').write(delhi_info)
soup = BeautifulSoup(open('delhi_info.html'))
soup.prettify

print only this much :- 只打印这么多:-

<bound method BeautifulSoup.prettify of <html><head><title>Local Weather Forecast</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="MSHTML 5.00.2920.0" name="GENERATOR" /></head>
<frameset border="0" cols="330,611*" frameborder="NO" framespacing="0" rows="*"><frame name="menuFrame" noresize="noResize" src="menu.php" /><frame name="mainframe" src="http://202.54.31.7/citywx/city_weather1.php?id=42182" /></frameset></html>
>

Whereas if I inspect the locally saved page "delhi_info.html" in chrome, I can see hell lot of information date, temperature, cloudy etc etc (ie lots of , 's ) , but why cant I see them via any of BeautifulSoup methods. 而如果我在Chrome浏览器中检查了本地保存的页面“ delhi_info.html”,则可以看到大量的信息,例如日期,温度,阴天等(例如,很多,),但是为什么我无法通过BeautifulSoup方法看到它们。 Please help 请帮忙

You have frame element in the HTML. HTML中有框架元素。 You have this code in your saved HTML file: 您保存的HTML文件中包含以下代码:

src="http://202.54.31.7/citywx/city_weather1.php?id=42182"

BeautifulSoup can't scrap this frame, so you need to extract this URL, open it and then scrap the data. BeautifulSoup无法删除此框架,因此您需要提取此URL,将其打开,然后废弃数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM