网站抓取imd网站的一些问题

Question

So I was scraping this Indian weather website 所以我在刮这个印度天气网站

http://202.54.31.7/citywx/localwx.php

So from the left pane you can see all the Indian states, and if you hover over them you can select the cities/districts. 因此，在左窗格中，您可以看到所有印度州，如果将鼠标悬停在它们上方，则可以选择城市/地区。 So I chose Delhi->safdarjung from left pane and saved this page locally as:- 所以我从左窗格中选择了Delhi->safdarjung ，并将此页面本地保存为：-

from BeautifulSoup import BeautifulSoup
import urllib, urllib2

imd_ind = urllib2.urlopen('http://202.54.31.7/citywx/localwx.php')
delhi_info = imd_ind.read()
open('delhi_info.html', 'w').write(delhi_info)
soup = BeautifulSoup(open('delhi_info.html'))
soup.prettify

print only this much :- 只打印这么多：-

<bound method BeautifulSoup.prettify of <html><head><title>Local Weather Forecast</title>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type" />
<meta content="MSHTML 5.00.2920.0" name="GENERATOR" /></head>
<frameset border="0" cols="330,611*" frameborder="NO" framespacing="0" rows="*"><frame name="menuFrame" noresize="noResize" src="menu.php" /><frame name="mainframe" src="http://202.54.31.7/citywx/city_weather1.php?id=42182" /></frameset></html>
>

Whereas if I inspect the locally saved page "delhi_info.html" in chrome, I can see hell lot of information date, temperature, cloudy etc etc (ie lots of , 's ) , but why cant I see them via any of BeautifulSoup methods. 而如果我在Chrome浏览器中检查了本地保存的页面“ delhi_info.html”，则可以看到大量的信息，例如日期，温度，阴天等（例如，很多，），但是为什么我无法通过BeautifulSoup方法看到它们。 Please help 请帮忙

Answer 1

You have frame element in the HTML. HTML中有框架元素。 You have this code in your saved HTML file: 您保存的HTML文件中包含以下代码：

src="http://202.54.31.7/citywx/city_weather1.php?id=42182"

BeautifulSoup can't scrap this frame, so you need to extract this URL, open it and then scrap the data. BeautifulSoup无法删除此框架，因此您需要提取此URL，将其打开，然后废弃数据。

网站抓取imd网站的一些问题

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-09-30 11:46:56

网站抓取imd网站的一些问题

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-09-30 11:46:56

解决方案1
0 已采纳 2014-09-30 11:46:56