简体   繁体   English

如何使用 Python/Selenium 获取表格及其元素

[英]How to get table and it's element with Python/Selenium

I'm trying to get all the price in the table at this URL: https://www.skyscanner.it/trasporti/voli/bud/rome/?adults=1&adultsv2=1&cabinclass=economy&children=0&childrenv2=&destinationentityid=27539793&inboundaltsenabled=true&infants=0&iym=2208&originentityid=27539604&outboundaltsenabled=true&oym=2208&preferdirects=false&ref=home&rtn=1&selectedoday=01&selectediday=01 The table elements are the days with the related price.我试图在这个 URL 的表格中获取所有价格: https://www.skyscanner.it/trasporti/voli/bud/rome/?adults=1&adultsv2=1&cabinclass=economy&children=0&childrenv2=&destinationentityid=27539793&inboundaltsenabled=true&infants=0&iym=2208&originentityid=27539604&outboundaltsenabled=true&oym=2208&preferdirects=false&ref=home&rtn=1&selectedoday=01&selectediday=01表格元素是带有相关价格的日期。

This is what I'm trying to do to get the table:这就是我试图获取表格的方法:

#Attempt 1
week = table.find_element(By.CLASS_NAME, "BpkCalendarGrid_bpk-calendar-grid__NzBmM month-view-grid--data-loaded")

#Attempt 2
table = driver.find_element(by=By.XPATH, value="Xpath copied using Crhome inspector"

However I cannot get it.但是我无法得到它。 What is the correct way to extract all the price from this table?从此表中提取所有价格的正确方法是什么? Thanks!谢谢!

You can grab table data meaning all prices using selenium with pandas DataFrame. There are two tables exist of the table data prices您可以使用 selenium 和 pandas DataFrame 获取表示所有价格的表数据。表数据价格有两个表

import pandas as pd
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


option = webdriver.ChromeOptions()
option.add_argument("start-maximized")

#chrome to stay open
option.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get('https://www.skyscanner.it/trasporti/voli/bud/rome/?adults=1&adultsv2=1&cabinclass=economy&children=0&childrenv2=&destinationentityid=27539793&inboundaltsenabled=true&infants=0&iym=2208&originentityid=27539604&outboundaltsenabled=true&oym=2208&preferdirects=false&ref=home&rtn=1&selectedoday=01&selectediday=01')


table = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '(//table)[1]'))).get_attribute("outerHTML")
table_2 = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '(//table)[2]'))).get_attribute("outerHTML")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="acceptCookieButton"]'))).click()

df1 = pd.read_html(table)[0]
print(df1)

df2 = pd.read_html(table_2)[0]
print(df2)

Output: Output:

  lun     mar     mer     gio     ven     sab     dom
0   1€ 40   2€ 28   3€ 32   4€ 37   5€ 34   6€ 35   7€ 34
1   8€ 34   9€ 28  10€ 27  11€ 26  12€ 26  13€ 46  14€ 35
2  15€ 35  16€ 40  17€ 36  18€ 51  19€ 28  20€ 33  21€ 36
3  22€ 38  23€ 38  24€ 30  25€ 50  26€ 43  27€ 50  28€ 51
4  29€ 38  30€ 36  31€ 58      1-      2-      3-      4-
5      5-      6-      7-      8-      9-     10-     11-
      lun     mar     mer     gio     ven     sab     dom
0   1€ 40   2€ 28   3€ 32   4€ 37   5€ 34   6€ 35   7€ 34
1   8€ 34   9€ 28  10€ 27  11€ 26  12€ 26  13€ 46  14€ 35
2  15€ 35  16€ 40  17€ 36  18€ 51  19€ 28  20€ 33  21€ 36
3  22€ 38  23€ 38  24€ 30  25€ 50  26€ 43  27€ 50  28€ 51
4  29€ 38  30€ 36  31€ 58      1-      2-      3-      4-
5      5-      6-      7-      8-      9-     10-     11-

webdriverManager webdriverManager

Alternative solution(Table-1): Thus way you can extract prices from table two too.替代解决方案(表 1):这样您也可以从表二中提取价格。

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

from selenium.webdriver.chrome.options import Options

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC


option = webdriver.ChromeOptions()
option.add_argument("start-maximized")

#chrome to stay open
option.add_experimental_option("detach", True)

driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get('https://www.skyscanner.it/trasporti/voli/bud/rome/?adults=1&adultsv2=1&cabinclass=economy&children=0&childrenv2=&destinationentityid=27539793&inboundaltsenabled=true&infants=0&iym=2208&originentityid=27539604&outboundaltsenabled=true&oym=2208&preferdirects=false&ref=home&rtn=1&selectedoday=01&selectediday=01')

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, '//*[@id="acceptCookieButton"]'))).click()

table = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '(//table)[1]/tbody/tr/td')))

for i in table:
    price = i.find_element(By.XPATH,'.//div[@class="price"]').text.replace('€','').strip() 
    print(price)

Output: Output:

39
30
32
37
34
35
34
34
28
27
26
26
46
35
35
40
36
52
29
34
37
39
39
30
50
44
50
52
38
36
58

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM