繁体   English   中英

用硒抓取动态内容

[英]Scraping dynamic content with Selenium

我正在尝试学习如何从Web上抓取内容,并且在先前的尝试中成功地发现了我认为是动态内容的内容,但事实证明该内容被隐藏在源代码中显示的标签下。 多亏了这里的社区,我能够轻松地通过Beautiful Soup和pandas来获取数据。

对于下一个挑战,我试图从实际动态生成的网站(似乎不在页面源中)中获取数据。 我的代码在下面,虽然我可以拉出容纳动态内容的容器,但它是空的。 当我使用开发人员工具查看时,可以看到带有class =“ event 2-2-1 row”的div包含一些数据。 但是,每次我尝试使用这些标签时,都不会找到它们。

有人可以帮我指出正确的方向吗? 我已经搜索过该论坛,但尚未找到答案。

from selenium import webdriver
import re
from bs4 import BeautifulSoup


start_url = "https://www.tissottiming.com/Live/Index?id=0003100005010105FFFFFFFFFFFFFFF2&style=Tissot"#input("Enter the results URL: ")
driver = webdriver.Chrome()
driver.implicitly_wait(10)
driver.get(start_url)
content = driver.find_element_by_xpath('//*[@id="container-fluid"]')
print(content)

这是我从打印语句中得到的。

<selenium.webdriver.remote.webelement.WebElement (session="99ca6419fd181c0bdd39797e20c739df", element="0.7688034456332402-1")>

我设法使用以下代码解析动态内容:

from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC # available since 2.26.0

start_url = "https://www.tissottiming.com/Live/Index?id=0003100005010105FFFFFFFFFFFFFFF2&style=Tissot"#input("Enter the results URL: ")
driver = webdriver.Chrome()
driver.get(start_url)
WebDriverWait(driver, 15).until(EC.presence_of_element_located((By.XPATH, "//div[@class='heat 2_2_1_1_1 row']")))

x = driver.find_element_by_xpath("//div[@class='heat 2_2_1_1_1 row']")
print(x.get_attribute('innerHTML'))

<div class="name row"><span>HEAT 1</span></div><div class="heatsheaders row rowtitle"><div class="col-xs-05 rank">Rank</div><div class="col-xs-05 bib">Bib</div><div class="col-xs-3 longname">Name</div><div class="col-xs-1 nation">Nat</div><div class="col-xs-5 run_title"><div class="RunName col-xs-4">1ST RACE</div><div class="RunName col-xs-4">2ND RACE</div><div class="RunName col-xs-4">DECIDER</div></div><div class="col-xs-1 qualified"></div><div class="col-xs-1 points">Time</div></div><div class="rider 2_2_1_1_1_1_1 row" data-sortorder="1" data-inter-pos-x="2" data-inter-pos-y="342" data-final-pos-x="2" data-final-pos-y="342" style="transition: all 600ms ease 0ms, opacity 600ms linear; display: block; transform: translate(0px, 0px);" data-bound="true"><div class="rank col-xs-05"><span>1</span></div><div class="bib col-xs-05"><span>52</span></div><div class="longname col-xs-3"><span>GLAETZER Matthew</span><div class="teamname "><span>AUSTRALIA</span></div></div><div class="nation col-xs-1"><span><div class="img_flag">AUS<img src="/Content/images/flags/AUS.png" alt="AUS national flag"></div></span></div><div class="run_group col-xs-5"><div class="run 2_2_1_1_1_1_1_1_1 col-xs-4"><div class="time row"><span>10.218</span></div><div class="points row"><span>70,464</span></div></div><div class="run 2_2_1_1_1_1_1_1_2 col-xs-4"><div class="time row"><span>0.000</span></div><div class="points row"><span>0,000</span></div></div><div class="run 2_2_1_1_1_1_1_1_3 col-xs-4"><div class="time row"><span></span></div><div class="points row"><span></span></div></div></div><div class="qualified col-xs-1"><span>QG</span></div><div class="points col-xs-1"><span></span></div></div><div class="rider 2_2_1_1_1_1_2 row" data-sortorder="2" data-inter-pos-x="2" data-inter-pos-y="422" data-final-pos-x="2" data-final-pos-y="422" style="transition: all 600ms ease 0ms, opacity 600ms linear; display: block; transform: translate(0px, 0px);" data-bound="true"><div class="rank col-xs-05"><span>2</span></div><div class="bib col-xs-05"><span>53</span></div><div class="longname col-xs-3"><span>HART Nathan</span><div class="teamname "><span>AUSTRALIA</span></div></div><div class="nation col-xs-1"><span><div class="img_flag">AUS<img src="/Content/images/flags/AUS.png" alt="AUS national flag"></div></span></div><div class="run_group col-xs-5"><div class="run 2_2_1_1_1_1_2_1_1 col-xs-4"><div class="time row"><span>+0.028</span></div><div class="points row"><span></span></div></div><div class="run 2_2_1_1_1_1_2_1_2 col-xs-4"><div class="time row"><span>+0.000</span></div><div class="points row"><span></span></div></div><div class="run 2_2_1_1_1_1_2_1_3 col-xs-4"><div class="time row"><span></span></div><div class="points row"><span></span></div></div></div><div class="qualified col-xs-1"><span>QB</span></div><div class="points col-xs-1"><span></span></div></div>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM