如何在python中使用Selenium Webdriver滾動動態網頁的特定部分？

Question

我發現有很多參考文獻可以滾動整個網頁，但是我正在尋找要滾動的特定部分。 我正在marketwatch.com-部分-最新新聞標簽上工作。 如何使用Selenium Webdriver滾動此最新新聞選項卡？

下面是我的代碼，該代碼返回新聞的標題，但會重復相同的標題。

from bs4 import BeautifulSoup
import urllib
import csv
import time
from selenium import webdriver


count = 0   
browser = webdriver.Chrome()
browser.get("https://www.marketwatch.com/newsviewer")

pageSource = browser.page_source

soup = BeautifulSoup(pageSource, 'lxml')

arkodiv = soup.find("ol", class_="viewport")

while browser.find_element_by_tag_name('ol'):
    browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(0.5)
    div = list(arkodiv.find_all('div', class_= "nv-details"))

    heading = []
    Data_11 = list(soup.find_all("div", class_ = "nv-text-cont"))          

    datetime = list(arkodiv.find_all("li", timestamp = True))
    for sa in datetime:
        sh = sa.find("div", class_ = "nv-text-cont")
        if sh.find("a", class_ = True):
            di = sh.text.strip()
            di = di.encode('ascii', 'ignore').decode('ascii')
        else:
            continue
        print di
        heading.append((di))       
        count = count+1         


    if 'End of Results' in arkodiv:
        print 'end'
        break
    else:
        continue
    print count

Answer 1

發生這種情況是因為您正在執行的腳本滾動到頁面底部。

為了在獲取新聞的元素中繼續滾動，您需要替換以下內容：

browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

有了這個：

browser.execute_script("document.documentElement.getElementsByClassName('viewport')[0].scrollTop = 999999")

編輯

這是完整的工作解決方案：

from bs4 import BeautifulSoup
import urllib
import csv
import time
from selenium import webdriver


count = 0   
browser = webdriver.Chrome()
browser.get("https://www.marketwatch.com/newsviewer")

while browser.find_element_by_tag_name('ol'):

    pageSource = browser.page_source
    soup = BeautifulSoup(pageSource, 'lxml')
    arkodiv = soup.find("ol", class_="viewport")
    browser.execute_script("document.documentElement.getElementsByClassName('viewport')[0].scrollTop = 999999")
    time.sleep(0.5)
    div = list(arkodiv.find_all('div', class_= "nv-details"))

    heading = set()
    Data_11 = list(soup.find_all("div", class_ = "nv-text-cont"))          

    datetime = list(arkodiv.find_all("li", timestamp = True))
    for sa in datetime:
        sh = sa.find("div", class_ = "nv-text-cont")
        if sh.find("a", class_ = True):
            di = sh.text.strip()
            di = di.encode('ascii', 'ignore').decode('ascii')
        else:
            continue
        print di
        heading.add((di))       
        count = count+1         


    if 'End of Results' in arkodiv:
        print 'end'
        break
    else:
        continue
    print count

編輯2

您可能還想更改標題的存儲方式，因為當前的方式將重復項保留在列表中。 將其更改為一set這樣就不會發生。

如何在python中使用Selenium Webdriver滾動動態網頁的特定部分？

問題描述

1 個解決方案

解決方案1
2 已采納 2018-04-10 22:53:57

如何在python中使用Selenium Webdriver滾動動態網頁的特定部分？

問題描述

1 個解決方案

解決方案1 2 已采納 2018-04-10 22:53:57

解決方案1
2 已采納 2018-04-10 22:53:57