簡體   English   中英

BeautifulSoup 從酒店網站抓取不返回

[英]BeautifulSoup scraping from a hotel's website no return

我正在嘗試從不同的酒店網站上抓取酒店數據。 我可以成功地從 Bookings.com 之類的網站上抓取信息,但我很難為特定酒店網站(不是批量預訂網站)獲取任何 output。

下面的代碼適用於大眾預訂網站,但是當我更改 URL 和 div class 名稱時,我試圖刮掉,我沒有得到任何 Z78E6221F6393D1356681DB398F14CEDZ。 我是否為我想要的信息選擇了錯誤的 div class,或者我無法抓取這些類型的網站?

from bs4 import BeautifulSoup
import requests

headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36'}
url = 'https://bookings.guoman.com/100259?datein=06/05/2021&dateout=06/08/2021&rooms=1&adults=1&languageid=1#/accommodation/room'

response=requests.get(url, headers=headers)

soup=BeautifulSoup(response.content, "lxml")


for item in soup.select('.CardList-summary'):
    print(item.string)

按照建議,您可以在 Selenium 中執行此操作。 此外,您的定位器至少是錯誤的。 正確的定位器是: .CardList-summary.CardList-summary-multirate 檢查這個例子:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By


url = 'https://bookings.guoman.com/100259?datein=06/05/2021&dateout=06/08/2021&rooms=1&adults=1&languageid=1#/accommodation/room'
driver = webdriver.Chrome(executable_path='/snap/bin/chromium.chromedriver')
driver.get(url)
#driver.implicitly_wait(10)
WebDriverWait(driver, 15).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, ".CardList-summary-title.ng-binding")))
results= driver.find_elements_by_css_selector(".CardList-summary-title.ng-binding")
for res in results:
    print(res.text)
driver.close()
driver.quit()

Output:

Standard Sleeper Double
Standard Double Room
Standard Twin Room
Standard Twin with Tower Bridge View
Executive Double Room
Executive Twin Room
Suite Double Room
Family Room Sleeps 3
Accessible Double Room

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM