[英]BeautifulSoup scraping from a hotel's website no return
我正在嘗試從不同的酒店網站上抓取酒店數據。 我可以成功地從 Bookings.com 之類的網站上抓取信息,但我很難為特定酒店網站(不是批量預訂網站)獲取任何 output。
下面的代碼適用於大眾預訂網站,但是當我更改 URL 和 div class 名稱時,我試圖刮掉,我沒有得到任何 Z78E6221F6393D1356681DB398F14CEDZ。 我是否為我想要的信息選擇了錯誤的 div class,或者我無法抓取這些類型的網站?
from bs4 import BeautifulSoup
import requests
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36'}
url = 'https://bookings.guoman.com/100259?datein=06/05/2021&dateout=06/08/2021&rooms=1&adults=1&languageid=1#/accommodation/room'
response=requests.get(url, headers=headers)
soup=BeautifulSoup(response.content, "lxml")
for item in soup.select('.CardList-summary'):
print(item.string)
按照建議,您可以在 Selenium 中執行此操作。 此外,您的定位器至少是錯誤的。 正確的定位器是: .CardList-summary.CardList-summary-multirate
。 檢查這個例子:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
url = 'https://bookings.guoman.com/100259?datein=06/05/2021&dateout=06/08/2021&rooms=1&adults=1&languageid=1#/accommodation/room'
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get(url)
#driver.implicitly_wait(10)
WebDriverWait(driver, 15).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".CardList-summary-title.ng-binding")))
results= driver.find_elements_by_css_selector(".CardList-summary-title.ng-binding")
for res in results:
print(res.text)
driver.close()
driver.quit()
Output:
Standard Sleeper Double
Standard Double Room
Standard Twin Room
Standard Twin with Tower Bridge View
Executive Double Room
Executive Twin Room
Suite Double Room
Family Room Sleeps 3
Accessible Double Room
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.