[英]Rotating IP with selenium and Tor
I have a selenium configuration for scraping a specific HTTP request, this request was send only when I click on a specific REACT element of a website.我有一个用于抓取特定 HTTP 请求的 selenium 配置,仅当我单击网站的特定 REACT 元素时才会发送此请求。 That's the reason why i'm using selenium... can't find other way.
这就是我使用硒的原因……找不到其他方法。
I must renew my IP, each time I want to scrape this specific HTTP request.每次我想抓取这个特定的 HTTP 请求时,我都必须更新我的 IP。 For achieve this I use Tor.
为此,我使用 Tor。 When I start my python script it works very well, Tor set a new ip and scrape what I want.
当我启动我的 python 脚本时,它运行得很好,Tor 设置了一个新的 ip 并抓取了我想要的东西。 I have add a try/catch to my script, if my script can't work the first time, it will retry 10 times.
我在我的脚本中添加了一个 try/catch,如果我的脚本第一次不能工作,它会重试 10 次。
The problem is when my script try another time, the IP can't rotate anymore....问题是当我的脚本再试一次时,IP 不能再旋转了....
how achieve this ?如何实现这一目标?
import time
from random import randint
from time import sleep
import os
import subprocess
from selenium.webdriver.firefox.firefox_profile import FirefoxProfile
from seleniumwire import webdriver
from selenium.webdriver.firefox.options import Options
from fake_useragent import UserAgent
options_wire = {
'proxy': {
'http': 'http://localhost:8088',
'https': 'https://localhost:8088',
'no_proxy': ''
}
}
def firefox_init():
os.system("killall tor")
time.sleep(1)
ua = UserAgent()
user_agent = ua.random
subprocess.Popen(("tor --HTTPTunnelPort 8088"),shell=True)
time.sleep(2)
return user_agent
def profile_firefox():
profile = FirefoxProfile()
profile.set_preference('permissions.default.image', 2)
profile.set_preference('dom.ipc.plugins.enabled.libflashplayer.so', 'false')
profile.set_preference("general.useragent.override", firefox_init())
profile.set_preference("driver.privatebrowsing.autostart", True)
profile.update_preferences()
return profile
def options_firefox():
options = Options()
options.headless = False
return options
def firefox_closing(driver):
driver.quit()
time.sleep(3)
os.system('killall tor')
def headless(url):
for x in range(0, 10):
profile = profile_firefox()
options = options_firefox()
driver = webdriver.Firefox(seleniumwire_options=options_wire,firefox_profile=profile, options=options, executable_path='******/headless_browser/geckodriver')
driver.set_window_position(0, 0)
driver.set_window_size(randint(1024, 2060), randint(1024, 4100))
# time.sleep(randint(3,10))
driver.get(url)
time.sleep(randint(3,8))
try:
if driver.find_element_by_xpath("//*[@id=\"*******\"]/main/div/div/div[1]/div[2]/form/div/div[2]/div[1]/button"):
# driver.find_element_by_xpath("//*[@id=\"*******\"]/main/div/div/div[1]/div[2]/form/div/div[2]/div[1]/button").click()
# time.sleep(randint(8,10))
driver.find_element_by_xpath("//*[@id=\"*******\"]/main/div/div/div[1]/div[2]/form/div/div[2]/div[1]/button").click()
time.sleep(randint(3,6))
for request in driver.requests:
if request.path == "https://api.*********.***/*******/*********":
request_api = request
raw = str(request_api.body)
request_api = raw.split(('b\''))
payload_raw = request_api[1]
payload = payload_raw[:-1]
if payload:
header = request.headers
print(header, payload)
break
else:
continue
break
except:
firefox_closing(driver)
time.sleep(5)
finally:
firefox_closing(driver)
return header, payload
url="https://check.torproject.org/?lang=fr"
headless(url)
Thank you谢谢
Well, I can't possibly know how it's not renewing the IP address since you kill the tor process.好吧,我不可能知道它是如何不更新 IP 地址的,因为你杀死了 tor 进程。 Even if you put tor as a service in Systemd, it'd renew as you restart the service, certainly.
即使您将 Tor 作为服务放在 Systemd 中,它肯定会在您重新启动服务时更新。 But I might give you some directions:
但我可能会给你一些指导:
ua = UserAgent(cache=False, use_cache_server=False)
So to achieve this, I use an other proxy, selenium-wire is very good but it need to be fix.所以为了实现这一点,我使用了其他代理,selenium-wire 非常好,但需要修复。
I have use Browsermob proxy and set an upstream proxy to work with.我已经使用 Browsermob 代理并设置了一个上游代理来使用。 The result is you can catch every HTTP resquest or response parse it and the ip rotate every time and use tor HTTPTunnelPort configuration.
结果是您可以捕获每个 HTTP 请求或响应解析它并且每次都轮换 ip 并使用 HTTPTunnelPort 配置。
proxy_params = {'httpProxy': 'localhost:8088', 'httpsProxy': 'localhost:8088'}
proxy_b = server.create_proxy(params=proxy_params)
Thanks谢谢
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.