[英]How to get dynamic HTML and Javascript values from a page using PhantomJS
[英]How to extract values from dynamic website using selenium and PhantomJS
我正在嘗試獲取計時器的值>此網站上的http://prntscr.com/kcbwd8 > https://www.whenisthenextsteamsale.com/,並希望將其存儲在變量中。
import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36
(KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')
soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})
for item in result:
print(item.text)
browser.quit()
我嘗試使用上面的代碼,但返回此錯誤>
C:\\ Users \\ rober \\ Anaconda3 \\ lib \\ site-packages \\ selenium \\ webdriver \\ phantomjs \\ webdriver.py:49:用戶警告:對PhantomJS的硒支持已被棄用,請改用無頭版本的Chrome或Firefox
warnings.warn('PhantomJS的硒支持已被棄用,請使用無頭'19:59:11
有沒有什么辦法解決這一問題 ? 如果沒有,還有另一種方法來獲取站點的動態值並將其存儲在變量中。
謝謝。
幻影不再被維護。 https://groups.google.com/forum/m/#!topic/phantomjs/9aI5d-LDuNE
您應該使用無頭鉻/ Firefox。
您將必須替換以下代碼:
browser = webdriver.PhantomJS()
browser.get('https://www.whenisthenextsteamsale.com/')
WITH
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
options = Options()
options.add_argument("--headless")
browser= webdriver.Firefox(firefox_options=options, executable_path="Path to geckodriver.exe")
browser.get('https://www.whenisthenextsteamsale.com/');
您的代碼是完美的。 盡管您沒有使用已定義為的標頭 :
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
我已經執行了自己的腳本,如下所示:
import urllib
from bs4 import BeautifulSoup as bs
import time
import requests
from selenium import webdriver
from urllib.request import urlopen, Request
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.3"}
browser = webdriver.PhantomJS(executable_path=r'C:\\Utility\\phantomjs-2.1.1-windows\\bin\\phantomjs.exe')
browser.get('https://www.whenisthenextsteamsale.com/')
soup = bs(browser.page_source, "html.parser")
result = soup.find_all("p",{"id":"subTimer"})
for item in result:
print(item.text)
browser.quit()
我確實在控制台上看到與以下命令相同的輸出:
C:\Python\lib\site-packages\selenium\webdriver\phantomjs\webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
08:06:16
值得一提的是, Selenium團隊已經在Selenium Java Client中刪除了對PhantomJS的默認支持,並將在Selenium Python Client中遵循默認支持。 您正在觀察的警告是PhantomJS的__init__()
方法的一部分,如下所示:
def __init__(self, executable_path="phantomjs",
port=0, desired_capabilities=DesiredCapabilities.PHANTOMJS,
service_args=None, service_log_path=None):
"""
Creates a new instance of the PhantomJS / Ghostdriver.
Starts the service and then creates new instance of the driver.
:Args:
- executable_path - path to the executable. If the default is used it assumes the executable is in the $PATH
- port - port you would like the service to run, if left as 0, a free port will be found.
- desired_capabilities: Dictionary object with non-browser specific
capabilities only, such as "proxy" or "loggingPref".
- service_args : A List of command line arguments to pass to PhantomJS
- service_log_path: Path for phantomjs service to log to.
"""
warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
'versions of Chrome or Firefox instead')
self.service = Service(
executable_path,
port=port,
service_args=service_args,
log_path=service_log_path)
self.service.start()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.