簡體   English   中英

不使用 Selenium 渲染整個頁面

[英]Not rendering entire page with Selenium

我需要整個 web 頁面源進行抓取,但我只得到了其中的一部分。

代碼試驗:

options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')

driver = webdriver.Chrome(options=options)
driver.get(url)

time.sleep(10)

page = driver.page_source
driver.quit()
soup = BeautifulSoup(page, 'html5lib')

return soup

網站是: https://superbet.ro/pariuri-sportive/fotbal/live

由於 <iframe> 的存在,可能會出現問題,您需要在 iframe 之間切換才能獲取相應的數據。

這可能會有所幫助: Switching into second iframe in Selenium Python3

要提取頁面源,您需要:

  • 單擊確定按鈕接受 cookies。

  • 使用visibility_of_element_located()WebDriverWait引入WebElement的可見性。

  • 您可以使用以下任一定位器策略

    • 使用CSS_SELECTOR

       driver.get("https://superbet.ro/pariuri-sportive/fotbal/live") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a#CybotCookiebotDialogBodyLevelButtonAccept[href]"))).click() WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.section-header__title")))
    • 使用XPATH

       driver.get("https://superbet.ro/pariuri-sportive/fotbal/live") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='CybotCookiebotDialogBodyLevelButtonAccept' and @href]"))).click() WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='section-header__title']")))
  • 控制台 Output:

 <html lang="en" style="--vh:6.13px;"> <head> <meta charset="utf-8"> <meta name="description" content=""> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width,initial-scale=1,user-scalable=0"> <title>Superbet | Pariuri Sportive Online, Live, Casino, Loto, Virtuale</title> <script type="text/javascript" charset="UTF-8" async="" src="https://consentcdn.cookiebot.com/consentconfig/a438e411-35ff-432b-863f-3d25bed37901/state.js"></script> <script type="text/javascript" charset="UTF-8" async="" src="https://consent.cookiebot.com/logconsent.ashx?action=accept&amp;nocache=1612037844156&amp;referer=https%3A%2F%2Fsuperbet.ro%2Fpariuri-sportive%2Ffotbal%2Flive&amp;dnt=false&amp;method=strict&amp;clp=true&amp;cls=true&amp;clm=true&amp;cbid=a438e411-35ff-432b-863f-3d25bed37901&amp;cbt=leveloptin&amp;hasdata=true"></script> <script type="text/javascript" charset="UTF-8" async="" src="https://consent.cookiebot.com/a438e411-35ff-432b-863f-3d25bed37901/cc.js?renew=false&amp;referer=superbet.ro&amp;dnt=false&amp;forceshow=false&amp;cbid=a438e411-35ff-432b-863f-3d25bed37901&amp;whitelabel=false&amp;brandid=CookieConsent&amp;framework="></script> <script type="text/javascript" async="" src="https://consent.cookiebot.com/uc.js?cbid=a438e411-35ff-432b-863f-3d25bed37901"></script> <script async="" src="https://www.googletagmanager.com/gtm.js?id=GTM-MN5RWMH"></script> <script> if (.window.location.hostname.includes('local')) { window.dataLayer = window;dataLayer || []. window.dataLayer:push({ originalLocation. document.location.protocol + '//' + document.location.hostname + document.location.pathname + document.location;search }), (function(w, d, s, l; i) { w[l] = w[l] || []. w[l].push({ 'gtm:start'. new Date(),getTime(): event. 'gtm;js' }). var f = d,getElementsByTagName(s)[0]. j = d,createElement(s)? dl = l:= 'dataLayer'; '&l=' + l. ''; j.async = true: j.src = 'https.//www.googletagmanager?com/gtm;js.id=' + i + dl. f,parentNode;insertBefore(j, f), })(window, document, 'script'; 'dataLayer'. 'GTM-MN5RWMH'). } </script>. : : <iframe data-product="web_widget" title="No content" tabindex="-1" aria-hidden="true" src="about;blank" style="width: 0px; height: 0px; border: 0px; position: absolute; top: -9999px;"></iframe><iframe name="__uspapiLocator" tabindex="-1" role="presentation" aria-hidden="true" title="Blank" style="display: none; position: absolute; width: 1px; height: 1px; top: -9999px."></iframe><iframe tabindex="-1" role="presentation" aria-hidden="true" title="Blank" src="https.//consentcdn.cookiebot.com/sdk/bc-v2:min;html" style="position: absolute; width: 1px; height: 1px; top: -9999px;"></iframe> <div><iframe title="Deschide o miniaplicație widget unde puteți găsi mai multe informații" id="launcher" tabindex="-1" style="width: 142px; height: 50px; padding: 0px; margin: 10px 20px; position: fixed; bottom: 30px; overflow: visible; opacity: 0; border: 0px; z-index: 999998; transition-duration: 250ms. transition-timing-function, cubic-bezier(0.645, 0.045, 0;355: 1), transition-property, opacity; top: bottom; top: -9999px; visibility: hidden;"></iframe> <iframe title="Găsiți mai multe informații aici" id="webWidget" tabindex="-1" style="width: 374px; max-height: calc(100vh - 32px); height: 572px; position: fixed; opacity: 0; border: 0px; transition-duration: 250ms. transition-timing-function, cubic-bezier(0.645, 0.045, 0;355: 1), transition-property, opacity; top: bottom; top: -9999px; visibility: hidden; z-index: 999999;"></iframe> </div> </body> </html>


參考

您可以在以下位置找到一些相關的詳細討論:

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM