[英]open tor browser with python using selenium package
我正在尝试从 tor 浏览器中抓取网站。 我已经用这段代码完成了:
import webbrowser
url = 'http://www.google.com/'
webbrowser.register('firefox', None, webbrowser.BackgroundBrowser(r"C:\Users\Lenovo\Bureau\Tor Browser\Browser\firefox.exe"))
webbrowser.get('firefox').open(url)
但在涉及 web 抓取时,我实际上更熟悉 selenium 库。 我尝试使用此代码,但引发了WebDriverException错误。
from selenium import webdriver
from selenium.webdriver.firefox.firefox_binary import FirefoxBinary
binary = FirefoxBinary("C:/Users/Lenovo/Bureau/Tor Browser/Browser/firefox.exe")
driver = webdriver.Firefox(firefox_binary = binary)
url = 'https://www.google.com/'
driver.get(url)
我想知道是什么导致了这个错误,我该如何解决。
这是我遇到的完整错误:
WebDriverException Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_17288/4279882525.py in <module>
1 url = 'https://www.google.com/'
----> 2 driver.get(url)
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in get(self, url)
434 Loads a web page in the current browser session.
435 """
--> 436 self.execute(Command.GET, {'url': url})
437
438 @property
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\webdriver.py in execute(self, driver_command, params)
422 response = self.command_executor.execute(driver_command, params)
423 if response:
--> 424 self.error_handler.check_response(response)
425 response['value'] = self._unwrap_value(
426 response.get('value', None))
~\anaconda3\envs\aa\lib\site-packages\selenium\webdriver\remote\errorhandler.py in check_response(self, response)
245 alert_text = value['alert'].get('text')
246 raise exception_class(message, screen, stacktrace, alert_text) # type: ignore[call-arg] # mypy is not smart enough here
--> 247 raise exception_class(message, screen, stacktrace)
248
249 def _value_or_default(self, obj: Mapping[_KT, _VT], key: _KT, default: _VT) -> _VT:
WebDriverException: Message: Reached error page: about:neterror?e=proxyConnectFailure&u=https%3A//www.google.com/&c=UTF-8&d=Firefox%20is%20configured%20to%20use%20a%20proxy%20server%20that%20is%20refusing%20connections.
Stacktrace:
WebDriverError@chrome://remote/content/shared/webdriver/Errors.jsm:181:5
UnknownError@chrome://remote/content/shared/webdriver/Errors.jsm:488:5
checkReadyState@chrome://remote/content/marionette/navigate.js:64:24
onNavigation@chrome://remote/content/marionette/navigate.js:312:39
emit@resource://gre/modules/EventEmitter.jsm:160:20
receiveMessage@chrome://remote/content/marionette/actors/MarionetteEventsParent.jsm:42:25
你没有显示错误信息,所以我不知道你的问题是什么。
当我尝试在 Linux 上使用tor
时,它会正常打开tor
(仅警告"firefox_binary has been deprecated"
,但这不是问题)
但后来它不加载页面 - get(url)
- 并且不显示错误。
也许tor
是安全的浏览器,因为它阻止了Selenium
需要控制浏览器的一些功能。
但是如果你运行tor.network
那么你可以将它用作正常的proxy server
Firefox
。
如果页面http://127.0.0.1:9050显示"This is a SOCKs proxy, not an HTTP proxy."
然后tor.network
正在运行,你可以这样做:
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
from selenium.webdriver.common.proxy import Proxy, ProxyType
proxy = Proxy({
'proxyType': ProxyType.MANUAL,
'socksProxy': '127.0.0.1:9050',
'socksVersion': 5,
})
options = Options()
options.proxy = proxy
#options.binary_location = '/home/furas/bin/tor' # doesn't work
#options.binary_location = '/path/to/normal/firefox' # works
driver = webdriver.Firefox(options=options) # use path to standard `Firefox`
url = 'https://www.google.com/'
url = 'https://icanhazip.com' # it shows your IP
#url = 'https://httpbin.org/get' # it shows your IP and headers/cookies
driver.get(url)
附言。 有时tor
可能会使用端口9150
而不是9050
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.