简体   繁体   English

Selenium Python 从网站中提取信息并将其转储为 JSON 格式

[英]Selenium Python extracting information from a website and dumping it into JSON Format

I'm trying to open a Hotel website www.booking.com and extract the name, price, location, and link from the top 50 search results which are sorted by cheapest first.我正在尝试打开一个酒店网站www.booking.com并从按最便宜的优先排序的前 50 个搜索结果中提取名称、价格、位置和链接。 I'm using Selenium python to automate the process However some HTML elements are targetable while others are not.我正在使用 Selenium python 来自动化这个过程但是一些 HTML 元素是可定位的,而另一些则不是。 after inspecting the website I realized that all hotel names have the class name: fcab3ed991 a23c043802检查网站后,我意识到所有酒店名称都有类名: fcab3ed991 a23c043802

I tried to target all of them and put them into an array as seen in my code below.我试图将它们全部作为目标并将它们放入一个数组中,如下面的代码所示。 But I can't seem to target the element correctly.但我似乎无法正确定位元素。 What I'm I doing wrong?我做错了什么?

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options

PATH= "C:\Program Files (x86)\chromedriver.exe"
driver=webdriver.Chrome(PATH)
driver.get("https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaAKIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AvqR75YGwAIB0gIkZDQ4MTdjZDctYzIyNC00N2RlLWJhYjItZDU1YTAwMGU2M2Q12AIF4AIB&sid=8005d0cc6b75af8d0d2e74451b73cb8b&aid=304142&sb=1&sb_lp=1&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.html%3Flabel%3Dgen173nr-1FCAEoggI46AdIM1gEaAKIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AvqR75YGwAIB0gIkZDQ4MTdjZDctYzIyNC00N2RlLWJhYjItZDU1YTAwMGU2M2Q12AIF4AIB%26sid%3D8005d0cc6b75af8d0d2e74451b73cb8b%26sb_price_type%3Dtotal%26%26&ss=Jumeirah%2C+Dubai%2C+Dubai+Emirate%2C+United+Arab+Emirates&is_ski_area=&checkin_year=2022&checkin_month=8&checkin_monthday=1&checkout_year=2022&checkout_month=8&checkout_monthday=3&group_adults=2&group_children=0&no_rooms=1&map=1&b_h4u_keep_filters=&from_sf=1&ss_raw=jum&ac_position=1&ac_langcode=en&ac_click_type=b&dest_id=941&dest_type=district&place_id_lat=25.205553&place_id_lon=55.239216&search_pageview_id=c0ac477da63f02c2&search_pageview_id=c0ac477da63f02c2&search_selected=true&ac_suggestion_list_length=5&ac_suggestion_theme_list_length=0&order=price#map_closed")


try:
    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "d4924c9e74"))
    )

    element = WebDriverWait(driver, 10).until(
        EC.presence_of_element_located((By.CLASS_NAME, "fcab3ed991 a23c043802"))
    )
    names=element.find_elements_by_class_name("fcab3ed991 a23c043802")
except:
    driver.quit()

To extract the texts from the name and price fields you can use list comprehension and you can use the following locator strategies :要从名称价格字段中提取文本,您可以使用列表推导,并且可以使用以下定位器策略

  • Code block:代码块:

     driver.execute("get", {'url': 'https://www.booking.com/searchresults.html?label=gen173nr-1FCAEoggI46AdIM1gEaAKIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AvqR75YGwAIB0gIkZDQ4MTdjZDctYzIyNC00N2RlLWJhYjItZDU1YTAwMGU2M2Q12AIF4AIB&sid=8005d0cc6b75af8d0d2e74451b73cb8b&aid=304142&sb=1&sb_lp=1&src_elem=sb&error_url=https%3A%2F%2Fwww.booking.com%2Findex.html%3Flabel%3Dgen173nr-1FCAEoggI46AdIM1gEaAKIAQGYATG4ARfIAQzYAQHoAQH4AQKIAgGoAgO4AvqR75YGwAIB0gIkZDQ4MTdjZDctYzIyNC00N2RlLWJhYjItZDU1YTAwMGU2M2Q12AIF4AIB%26sid%3D8005d0cc6b75af8d0d2e74451b73cb8b%26sb_price_type%3Dtotal%26%26&ss=Jumeirah%2C+Dubai%2C+Dubai+Emirate%2C+United+Arab+Emirates&is_ski_area=&checkin_year=2022&checkin_month=8&checkin_monthday=1&checkout_year=2022&checkout_month=8&checkout_monthday=3&group_adults=2&group_children=0&no_rooms=1&map=1&b_h4u_keep_filters=&from_sf=1&ss_raw=jum&ac_position=1&ac_langcode=en&ac_click_type=b&dest_id=941&dest_type=district&place_id_lat=25.205553&place_id_lon=55.239216&search_pageview_id=c0ac477da63f02c2&search_pageview_id=c0ac477da63f02c2&search_selected=true&ac_suggestion_list_length=5&ac_suggestion_theme_list_length=0&order=price#map_closed'}) names = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[data-testid='title']")))] prices = [my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[data-testid='price-and-discounted-price'] > span")))] for i,j in zip(names, prices): print(f"{i} hotel price is {j}")
  • Note : You have to add the following imports :注意:您必须添加以下导入:

     from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
  • Console Output:控制台输出:

     Royal Prestige Hotel hotel price is ₹ 10,871 Rove La Mer Beach hotel price is ₹ 10,328 Dubai Marine Beach Resort & Spa hotel price is ₹ 12,133 Roda Beach Resort hotel price is ₹ 16,525 Bespoke Residences - 3 Bedroom Waikiki Townhouses hotel price is ₹ 20,395 Walking distance to Burj al Arab - 1BR Lamtara 2 hotel price is ₹ 16,724 Mandarin Oriental Jumeira, Dubai hotel price is ₹ 18,108 Four Seasons Resort Dubai at Jumeirah Beach hotel price is ₹ 20,003 Bulgari Resort, Dubai hotel price is ₹ 78,274 Spacious Villa! hotel price is ₹ 62,619 Palm Beach Hotel hotel price is ₹ 64,794 York International Hotel hotel price is ₹ 86,971 Moon , Backpackers , Partition for Couples and for singles hotel price is ₹ 208,731 Hafez Hotel Apartments Al Ras Metro Station hotel price is ₹ 2,022 Grand Pearl Hostel For Boys hotel price is ₹ 2,131 Time Palace Hotel Branch hotel price is ₹ 3,131 Hostel Youth hotel price is ₹ 3,157 Grand Mayfair Hotel hotel price is ₹ 3,601 Explore Old Dubai, Souks, Tastings, Museums hotel price is ₹ 4,592 Panorama Hotel Bur Dubai hotel price is ₹ 3,674 Zain International Hotel hotel price is ₹ 3,827 Panorama Hotel Deira hotel price is ₹ 3,870 Decent Boys Hostel in center of Bur Dubai next to Burjuman metro Station with all FREE Facilities hotel price is ₹ 3,875 Brand New Boys Hostel 1 min walk from Burjuman Metro Station EXIT-4 with all Brand New Furnishings & Free Facilities hotel price is ₹ 3,914 OYO 338 Transworld Hotel hotel price is ₹ 3,914

PS: Following this solution you can similarly extract the location and link texts as well and dump in a JSON format. PS:按照此解决方案,您也可以类似地提取位置链接文本并以 JSON 格式转储。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM