[英]How to download all rows data from a website using beatifulsoup
我想從天氣方面得到一些信息。 https://pogoda.interia.pl/archiwum-pogody-08-10-2019,cId,21295
分別小時和分鍾:
<div class="entry-hour">
<span><span class="hour">0</span><span class="minutes">00</span></span>
</div>
預測溫度:
<span class="forecast-temp">9°C</span>
和感覺溫度:
<span class="forecast-feeltemp">Odczuwalna 4°C </span>
我站着不動,因為我不知道如何獲取所有的行和數據的rest; ( 預先感謝您的幫助...
下面是我的偽代碼;)
#!/usr/bin/python3
import pymysql.cursors
from time import sleep, gmtime, strftime
import datetime
import pytz
from selenium import webdriver
from bs4 import BeautifulSoup
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
browser = webdriver.Chrome(
("/usr/bin/chromedriver"),
chrome_options=options)
browser.get("https://pogoda.interia.pl/archiwum-pogody-08-10-2019,cId,21295")
sleep(3)
source = browser.page_source # Get the entire page source from the browser
if browser is not None :browser.close() # No need for the browser so close it
soup = BeautifulSoup(source,'html.parser')
try:
Tags = soup.select('.weather-forecast-hbh-list') # get the elements using css selectors
for tag in Tags: # loop through them
hour = tag.find('div').find('span').text
#minutes = ?
#temp =?
#feel_temp = ?
print (hour + "\n")
except Exception as e:
print(e)
一種方法是使用 class weather-entry
遍歷所有 div,然后從每個 div 中提取文本,沿途構建一個表格結構。
例如:
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
page = requests.get('https://pogoda.interia.pl/archiwum-pogody-08-10-2019,cId,21295').content
weather_entries = BeautifulSoup(page, "html.parser").find_all("div", {"class": "weather-entry"})
def extract_text(element, class_name):
return element.find("div", class_=class_name).getText(strip=True)
div_classes = [
"entry-hour",
"entry-forecast",
"entry-wind",
"entry-precipitation",
"entry-humidity",
]
table = [[extract_text(e, c) for c in div_classes] for e in weather_entries]
columns = ["Time:", "Forecast", "Wind", "Precipitation", "Humidity"]
print(tabulate(table, headers=columns, tablefmt="pretty"))
這輸出:
+-------+---------------------------------------+----------------------+---------------+----------+
| Time: | Forecast | Wind | Precipitation | Humidity |
+-------+---------------------------------------+----------------------+---------------+----------+
| 000 | -2°COdczuwalna 0°CBezchmurnie | S4km/hMax 4 km/h | | 97% |
| 100 | -2°COdczuwalna -1°CBezchmurnie | S4km/hMax 7 km/h | Zachm:10% | 98% |
| 200 | -2°COdczuwalna -1°CBezchmurnie | SSW4km/hMax 8 km/h | | 98% |
| 300 | -2°COdczuwalna -1°CBezchmurnie | S4km/hMax 7 km/h | | 98% |
| 400 | -2°COdczuwalna 1°CBezchmurnie | N0km/hMax 7 km/h | | 93% |
| 500 | -2°COdczuwalna 1°CBezchmurnie | N0km/hMax 6 km/h | | 99% |
| 600 | -2°COdczuwalna -1°CZachmurzenie duże | SSW4km/hMax 6 km/h | Zachm:76% | 92% |
| 700 | -1°COdczuwalna 3°CZachmurzenie duże | N0km/hMax 7 km/h | Zachm:76% | 84% |
| 800 | -3°COdczuwalna -1°CPochmurno | SSW4km/hMax 8 km/h | Zachm:91% | 99% |
| 900 | 3°COdczuwalna 5°CPochmurno | SSW4km/hMax 8 km/h | Zachm:91% | 79% |
| 1000 | 5°COdczuwalna 4°CPochmurno | S11km/hMax 11 km/h | Zachm:91% | 71% |
| 1100 | 6°COdczuwalna 5°CPochmurno | SSW11km/hMax 20 km/h | Zachm:100% | 65% |
| 1200 | 9°COdczuwalna 7°CPochmurno | S15km/hMax 25 km/h | Zachm:100% | 66% |
| 1300 | 10°COdczuwalna 8°CPrzelotne opady | S15km/hMax 25 km/h | Zachm:100% | 60% |
| 1400 | 11°COdczuwalna 8°CPochmurno | S18km/hMax 24 km/h | Zachm:100% | 55% |
| 1500 | 10°COdczuwalna 6°CPochmurno | S22km/hMax 27 km/h | Zachm:91% | 57% |
| 1600 | 10°COdczuwalna 6°CPochmurno | S22km/hMax 31 km/h | Zachm:91% | 60% |
| 1700 | 12°COdczuwalna 8°CPrzelotne opady | S18km/hMax 32 km/h | Zachm:100% | 53% |
| 1800 | 9°COdczuwalna 4°CCzęściowo słonecznie | S18km/hMax 33 km/h | Zachm:50% | 66% |
| 1900 | 8°COdczuwalna 4°CPochmurno | S15km/hMax 31 km/h | Zachm:100% | 82% |
| 2000 | 8°COdczuwalna 4°CPochmurno | S18km/hMax 22 km/h | Zachm:91% | 82% |
| 2100 | 9°COdczuwalna 5°CPrzelotne opady | SSW18km/hMax 22 km/h | Zachm:100% | 78% |
| 2200 | 8°COdczuwalna 4°CPochmurno | SSW15km/hMax 28 km/h | Zachm:100% | 80% |
| 2300 | 8°COdczuwalna 5°CPrzelotne opady | SSW11km/hMax 25 km/h | Zachm:91% | 81% |
+-------+---------------------------------------+----------------------+---------------+----------+
顯然,您需要對文本值進行一些解析,但這應該可以幫助您入門。
謝謝我的朋友,我已經明白了;)我必須先獲取所有項目並循環返回它們;)
#!/usr/bin/python3
import requests
from bs4 import BeautifulSoup
page = requests.get('https://pogoda.interia.pl/archiwum-pogody-08-10-2019,cId,21295').content
weather_entries = BeautifulSoup(page, "html.parser").find_all("div", {"class": "weather-entry"})
for weather_entrie in weather_entries:
hour = weather_entrie.find('span', {'class' : 'hour'}).text
minutes = weather_entrie.find('span', {'class' : 'minutes'}).text
temp = weather_entrie.find('span', {'class' : 'forecast-temp'}).text
tempFeel = weather_entrie.find('span', {'class' : 'forecast-feeltemp'}).text
print(hour + ":" + minutes + " \t " + temp + " \t " + tempFeel)
我對BeatifulSoup
沒有太多經驗,但同樣可以通過 selenium web 使用 xpath 刮擦自身來實現。 下面的代碼可用於提取所需的詳細信息。
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
options = webdriver.ChromeOptions()
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')
browser = webdriver.Chrome(
("/usr/bin/chromedriver"),
chrome_options=options)
browser.get("https://pogoda.interia.pl/archiwum-pogody-08-10-2019,cId,21295")
WebDriverWait(browser, 30).until(EC.presence_of_element_located((By.XPATH, "//div[@class='entry-hour']")))
weather_entry = browser.find_elements_by_xpath("//div[@class='weather-entry']")
for w in weather_entry:
hour = w.find_element_by_xpath(".//div[@class='entry-hour']/span/span[@class='hour']").text
temp = w.find_element_by_xpath(".//div[@class='entry-forecast']/div//span[@class='temp-info']/span[@class='forecast-temp']").text
feeltemp = w.find_element_by_xpath(".//div[@class='entry-forecast']/div//span[@class='temp-info']/span[@class='forecast-feeltemp']").text
print('hour '+ hour + ' temp ' + temp + ' feeltemp ' + feeltemp)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.