使用 python、BeautifulSoup、Selenium 從表中抓取動態數據

Question

我想抓取與本網站表格中包含的足球比賽相關的所有 url 鏈接。

這是代碼：

from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Firefox()
url = 'https://www.coteur.com/cotes-foot.php'
driver.get(url)

fixture1 = driver.find_element_by_xpath("/html/body/div[3]/div/div[2]/div/div/div[2]/div/table/tbody/tr[3]/td[3]/a")
print(fixture1.text)

links = []
i = 3
while i <= 6:
    fixture = driver.find_element_by_xpath("/html/body/div[3]/div/div[2]/div/div/div[2]/div/table/tbody/tr[" + str(i) + "]/td[3]/a")
    links.append(fixture)
    i = i + 3

print(links)

driver.close()

當我抓取一個匹配項時，它會返回我期望的數據。 但是，當我嘗試循環獲取所有足球比賽時，我遇到了問題。

這是代碼的結果：

Betis Seville - Granada 74 Cf
[<selenium.webdriver.firefox.webelement.FirefoxWebElement (session="0199958a-4d31-4a21-9856-8f8c3cc8ee05", element="158fcdaf-501f-41a4-9550-8a42543acc22")>, <selenium.webdriver.firefox.webelement.FirefoxWebElement (session="0199958a-4d31-4a21-9856-8f8c3cc8ee05", element="74e67896-fccb-48da-8eef-bbf8d9a6f3b3")>]

我想得到第一個元素，但我沒有得到我所期望的。

Answer 1

這很好用

    from selenium import webdriver

    driver = webdriver.Firefox()
    driver.get("https://www.coteur.com/cotes-foot.php")

    links = driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]')

    data = [l.text for l in links]

    print(data)

Answer 2

我試過你的代碼，結果如下：

File "./coteur2.py", line 17
    data = [l.text for l in links]
    ^
IndentationError: unexpected indent

我更喜歡用這種方式：

links = driver.find_elements_by_xpath('//a[contains(@href, "match/cotes-")]')

n = 0
while n < len(links):
   links[n] = links[n].text
   n = n + 1

print(links)

謝謝你的幫助

使用 python、BeautifulSoup、Selenium 從表中抓取動態數據

問題描述

2 個解決方案

解決方案1
0 2020-06-16 11:52:57

解決方案2
0 2020-06-16 12:13:26

使用 python、BeautifulSoup、Selenium 從表中抓取動態數據

問題描述

2 個解決方案

解決方案1 0 2020-06-16 11:52:57

解決方案2 0 2020-06-16 12:13:26

解決方案1
0 2020-06-16 11:52:57

解決方案2
0 2020-06-16 12:13:26