簡體   English   中英

在python中遇到beautifulsoup有問題

[英]Having trouble with beautifulsoup in python

我是python的新手,並且遇到下面的代碼問題。 我試圖在網站上獲得溫度或日期,但似乎無法獲得輸出。 我嘗試了很多變化,但似乎仍然無法做到正確..

謝謝您的幫助!

#Code below: 
import requests,bs4
r = requests.get('http://www.hko.gov.hk/contente.htm')
print r.raise_for_status()
hkweather = bs4.BeautifulSoup(r.text)
print hkweather.select('div left_content fnd_day fnd_date')

您的css選擇器不正確,您應該使用. 在tag和css類之間,你想要的標簽位於div中,div中的fnd_day類的id為fnd_content

divs = soup.select("#fnd_content div.fnd_day")

但是仍然無法獲取數據,因為它是通過ajax請求動態生成的,您可以使用以下代碼獲取json格式的所有數據:

u = "http://www.hko.gov.hk/wxinfo/json/one_json.xml?_=1468955579991"

data = requests.get(u).json()

from pprint import pprint as pp
pp(data)

這幾乎返回了所有動態內容,包括日期和臨時等。

如果您訪問密鑰F9D ,您可以查看所有臨時和日期的一般天氣描述:

from pprint import pprint as pp

pp(data['F9D'])

輸出:

{'BulletinDate': '20160720',
 'BulletinTime': '0315',
 'GeneralSituation': 'A southwesterly airstream will bring showers to the '
                     'coast of Guangdong today. Under the dominance of an '
                     'upper-air anticyclone, it will be generally fine and '
                     'very hot over southern China in the latter part of this '
                     'week and early next week.',
 'NPTemp': '25',
 'WeatherForecast': [{'ForecastDate': '20160720',
                      'ForecastIcon': 'pic53.png',
                      'ForecastMaxrh': '95',
                      'ForecastMaxtemp': '32',
                      'ForecastMinrh': '70',
                      'ForecastMintemp': '26',
                      'ForecastWeather': 'Sunny periods and a few showers. '
                                         'Isolated squally thunderstorms at '
                                         'first.',
                      'ForecastWind': 'South to southwest force 4.',
                      'IconDesc': 'Sunny Periods with A Few Showers',
                      'WeekDay': '3'},
                     {'ForecastDate': '20160721',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '28',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'South to southwest force 3 to 4.',
                      'IconDesc': 'Hot',
                      'WeekDay': '4'},
                     {'ForecastDate': '20160722',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '28',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'Southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '5'},
                     {'ForecastDate': '20160723',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '34',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '29',
                      'ForecastWeather': 'Fine and very hot.',
                      'ForecastWind': 'Southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '6'},
                     {'ForecastDate': '20160724',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '34',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '29',
                      'ForecastWeather': 'Fine and very hot.',
                      'ForecastWind': 'Southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '0'},
                     {'ForecastDate': '20160725',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '29',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'South to southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '1'},
                     {'ForecastDate': '20160726',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '29',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'South to southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '2'},
                     {'ForecastDate': '20160727',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '28',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'Southwest force 3 to 4.',
                      'IconDesc': 'Hot',
                      'WeekDay': '3'},
                     {'ForecastDate': '20160728',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '28',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'Southwest force 3 to 4.',
                      'IconDesc': 'Hot',
                      'WeekDay': '4'}]}

唯一的查詢字符串參數是您可以使用時間lib生成的紀元時間戳

from time import time
u = "http://www.hko.gov.hk/wxinfo/json/one_json.xml?_={}".format(int(time()))

data = requests.get(u).json()

沒有傳遞時間戳也返回相同的數據,所以我將讓你調查重要性。

我能夠得到日期:

>>> import requests,bs4
>>> r = requests.get('http://www.hko.gov.hk/contente.htm')
>>> hkweather = bs4.BeautifulSoup(r.text)
>>> print hkweather.select('div[class="fnd_date"]')
# [<div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>]

但是文本丟失了。 這似乎不是BeautifulSoup的問題,因為我自己查看了r.text ,我看到的只是<div class="fnd_date"></div>而不是像<div class="fnd_date">July 20</div>

您可以使用正則表達式檢查文本是否存在(盡管使用帶有HTML的正則表達式是不受歡迎的):

>>> import re
>>> re.findall(r'<[^<>]*fnd_date[^<>]*>[^>]*>', r.text)
# [u'<div id="fnd_date" class="date"></div>', ... repeated 10 times]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM