簡體   English   中英

如何使用Selenium和PhantomJS從動態網站中提取值

[英]How to extract values from dynamic website using selenium and PhantomJS

我正在嘗試獲取計時器的值>此網站上的http://prntscr.com/kcbwd8 > https://www.whenisthenextsteamsale.com/,並希望將其存儲在變量中。

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')

soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})

for item in result:
    print(item.text)

browser.quit()

我嘗試使用上面的代碼,但返回此錯誤>

C:\\ Users \\ rober \\ Anaconda3 \\ lib \\ site-packages \\ selenium \\ webdriver \\ phantomjs \\ webdriver.py:49:用戶警告:對PhantomJS的硒支持已被棄用,請改用無頭版本的Chrome或Firefox
warnings.warn('PhantomJS的硒支持已被棄用,請使用無頭'19:59:11

有沒有什么辦法解決這一問題 ? 如果沒有,還有另一種方法來獲取站點的動態值並將其存儲在變量中。

謝謝。

幻影不再被維護。 https://groups.google.com/forum/m/#!topic/phantomjs/9aI5d-LDuNE

您應該使用無頭鉻/ Firefox。

您將必須替換以下代碼:

browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')

WITH

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.add_argument("--headless")
browser= webdriver.Firefox(firefox_options=options, executable_path="Path to geckodriver.exe")
browser.get('https://www.whenisthenextsteamsale.com/');

在此處下載Geckodriver下載GeckoDriver

您的代碼是完美的。 盡管您沒有使用已定義為的標頭

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

我已經執行了自己的腳本,如下所示:

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
browser = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
browser.get('https://www.whenisthenextsteamsale.com/')
soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})
for item in result:
    print(item.text)
browser.quit()

我確實在控制台上看到與以下命令相同的輸出:

C:\Python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
08:06:16

值得一提的是, Selenium團隊已經在Selenium Java Client中刪除了對PhantomJS的默認支持,並將在Selenium Python Client中遵循默認支持。 您正在觀察的警告PhantomJS__init__()方法的一部分,如下所示:

def __init__(self, executable_path="phantomjs",
             port=0, desired_capabilities=DesiredCapabilities.PHANTOMJS,
             service_args=None, service_log_path=None):
    """
    Creates a new instance of the PhantomJS / Ghostdriver.

    Starts the service and then creates new instance of the driver.

    :Args:
     - executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
     - port - port you would like the service to run, if left as 0, a free port will be found.
     - desired_capabilities: Dictionary object with non-browser specific
       capabilities only, such as "proxy" or "loggingPref".
     - service_args : A List of command line arguments to pass to PhantomJS
     - service_log_path: Path for phantomjs service to log to.
    """
    warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
                  'versions of Chrome or Firefox instead')
    self.service = Service(
        executable_path,
        port=port,
        service_args=service_args,
        log_path=service_log_path)
    self.service.start()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM