如何使用Selenium和PhantomJS從動態網站中提取值

Question

我正在嘗試獲取計時器的值>此網站上的http://prntscr.com/kcbwd8 > https://www.whenisthenextsteamsale.com/，並希望將其存儲在變量中。

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')

soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})

for item in result:
    print(item.text)

browser.quit()

我嘗試使用上面的代碼，但返回此錯誤>

C：\\ Users \\ rober \\ Anaconda3 \\ lib \\ site-packages \\ selenium \\ webdriver \\ phantomjs \\ webdriver.py：49：用戶警告：對PhantomJS的硒支持已被棄用，請改用無頭版本的Chrome或Firefox
warnings.warn（'PhantomJS的硒支持已被棄用，請使用無頭'19:59:11

有沒有什么辦法解決這一問題？ 如果沒有，還有另一種方法來獲取站點的動態值並將其存儲在變量中。

謝謝。

Answer 1

幻影不再被維護。 https://groups.google.com/forum/m/#!topic/phantomjs/9aI5d-LDuNE

您應該使用無頭鉻/ Firefox。

您將必須替換以下代碼：

browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')

WITH

from selenium import webdriver
from selenium.webdriver.firefox.options import Options

options = Options()
options.add_argument("--headless")
browser= webdriver.Firefox(firefox_options=options, executable_path="Path to geckodriver.exe")
browser.get('https://www.whenisthenextsteamsale.com/');

在此處下載Geckodriver ：下載GeckoDriver

Answer 2

您的代碼是完美的。 盡管您沒有使用已定義為的標頭：

headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}

我已經執行了自己的腳本，如下所示：

import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
browser = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
browser.get('https://www.whenisthenextsteamsale.com/')
soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})
for item in result:
    print(item.text)
browser.quit()

我確實在控制台上看到與以下命令相同的輸出：

C:\Python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
08:06:16

值得一提的是， Selenium團隊已經在Selenium Java Client中刪除了對PhantomJS的默認支持，並將在Selenium Python Client中遵循默認支持。 您正在觀察的警告是PhantomJS的__init__()方法的一部分，如下所示：

def __init__(self, executable_path="phantomjs",
             port=0, desired_capabilities=DesiredCapabilities.PHANTOMJS,
             service_args=None, service_log_path=None):
    """
    Creates a new instance of the PhantomJS / Ghostdriver.

    Starts the service and then creates new instance of the driver.

    :Args:
     - executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
     - port - port you would like the service to run, if left as 0, a free port will be found.
     - desired_capabilities: Dictionary object with non-browser specific
       capabilities only, such as "proxy" or "loggingPref".
     - service_args : A List of command line arguments to pass to PhantomJS
     - service_log_path: Path for phantomjs service to log to.
    """
    warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
                  'versions of Chrome or Firefox instead')
    self.service = Service(
        executable_path,
        port=port,
        service_args=service_args,
        log_path=service_log_path)
    self.service.start()

如何使用Selenium和PhantomJS從動態網站中提取值

問題描述

2 個解決方案

解決方案1
1 2018-07-29 07:37:54

解決方案2
1 2018-07-31 14:10:35

如何使用Selenium和PhantomJS從動態網站中提取值

問題描述

2 個解決方案

解決方案1 1 2018-07-29 07:37:54

解決方案2 1 2018-07-31 14:10:35

解決方案1
1 2018-07-29 07:37:54

解決方案2
1 2018-07-31 14:10:35