繁体   English   中英

使用 selenium package 使用 python 打开浏览器

[英]open tor browser with python using selenium package

我正在尝试从 tor 浏览器中抓取网站。 我已经用这段代码完成了:

import webbrowser
url = 'http://www.google.com/'
webbrowser.register('firefox', None, webbrowser.BackgroundBrowser(r"C:\Users\Lenovo\Bureau\Tor Browser\Browser\firefox.exe"))
webbrowser.get('firefox').open(url)

但在涉及 web 抓取时,我实际上更熟悉 selenium 库。 我尝试使用此代码,但引发了WebDriverException错误。

from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary("C:/Users/Lenovo/Bureau/Tor Browser/Browser/firefox.exe")
driver = webdriver.Firefox(firefox_binary = binary)
url = 'https://www.google.com/'
driver.get(url)

我想知道是什么导致了这个错误,我该如何解决。

这是我遇到的完整错误:

WebDriverException                        Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17288/4279882525.py in <module>
      1 url = 'https://www.google.com/'
----> 2 driver.get(url)

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in get(self, url)
    434         Loads a web page in the current browser session.
    435         """
--> 436         self.execute(Command.GET, {'url': url})
    437 
    438     @property

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
    422         response = self.command_executor.execute(driver_command, params)
    423         if response:
--> 424             self.error_handler.check_response(response)
    425             response['value'] = self._unwrap_value(
    426                 response.get('value', None))

~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
    245                 alert_text = value['alert'].get('text')
    246             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 247         raise exception_class(message, screen, stacktrace)
    248 
    249     def _value_or_default(self, obj: Mapping[_KT, _VT], key: _KT, default: _VT) -> _VT:

WebDriverException: Message: Reached error page: about:neterror?e=proxyConnectFailure&u=https%3A//www.google.com/&c=UTF-8&d=Firefox%20is%20configured%20to%20use%20a%20proxy%20server%20that%20is%20refusing%20connections.
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
UnknownError@chrome://remote/content/shared/webdriver/Errors.jsm:488:5
checkReadyState@chrome://remote/content/marionette/navigate.js:64:24
onNavigation@chrome://remote/content/marionette/navigate.js:312:39
emit@resource://gre/modules/EventEmitter.jsm:160:20
receiveMessage@chrome://remote/content/marionette/actors/MarionetteEventsParent.jsm:42:25

你没有显示错误信息,所以我不知道你的问题是什么。

当我尝试在 Linux 上使用tor时,它会正常打开tor
(仅警告"firefox_binary has been deprecated" ,但这不是问题)
但后来它不加载页面 - get(url) - 并且不显示错误。
也许tor是安全的浏览器,因为它阻止了Selenium需要控制浏览器的一些功能。


但是如果你运行tor.network那么你可以将它用作正常的proxy server Firefox

如果页面http://127.0.0.1:9050显示"This is a SOCKs proxy, not an HTTP proxy."
然后tor.network正在运行,你可以这样做:

from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType

proxy = Proxy({
    'proxyType': ProxyType.MANUAL,
    'socksProxy': '127.0.0.1:9050',
    'socksVersion': 5,
})

options = Options()
options.proxy = proxy 
#options.binary_location = '/home/furas/bin/tor'  # doesn't work
#options.binary_location = '/path/to/normal/firefox'  # works

driver = webdriver.Firefox(options=options)  #  use path to standard `Firefox`

url = 'https://www.google.com/'
url = 'https://icanhazip.com'     # it shows your IP
#url = 'https://httpbin.org/get'  # it shows your IP and headers/cookies

driver.get(url)

附言。 有时tor可能会使用端口9150而不是9050

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM