简体   繁体   English

生成有效IP地址列表并在Python Selenium Loop中随机使用

[英]Generate List of Valid IP Addresses & Randomly Use in Python Selenium Loop

Disclaimer: This is my first foray into web scraping 免责声明:这是我第一次涉足网络抓取

I have a list of ~400 search results URLs that I am trying to loop through using Selenium to collect information. 我有一个〜400个搜索结果URL的列表,我正在尝试使用Selenium来收集信息。 At a certain point, I am redirected and presented with the following text: 在某个时候,我将重定向并显示以下文本:

"Your access to VINELink.com has been declined due to higher than normal utilization levels... You are attempting to access this website from the following ip address. Please make sure your firewall settings are not restricting access. [MY IP ADDRESS]" “由于使用率高于正常水平,您对VINELink.com的访问已被拒绝...您正试图从以下IP地址访问此网站。请确保您的防火墙设置不限制访问。[我的IP地址]”

Is there a way to generate a list of valid random IP addresses, select one randomly within a loop and feed it to the Selenium WebDriver to avoid being blocked? 有没有一种方法可以生成有效的随机IP地址列表,可以在循环中随机选择一个,然后将其提供给Selenium WebDriver以避免被阻止?

I understand that there are ethical considerations to this question (in reality, I've contacted the site to explain my benign use case and ask if they can unblock my real IP address); 我了解此问题有道德考虑(实际上,我已经与该站点联系以解释我的良性用例,并询问它们是否可以解除阻止我的真实IP地址); I'm mostly just interested if this is something one could do. 我主要只是想知道是否可以做到这一点。

Abbreviated list of URLs: URL的缩写列表:

['http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=2662',
 'http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=A21069',
 'http://www.vinelink.com/vinelink/servlet/SubjectSearch?siteID=34003&agency=33&offenderID=B59293',
 ...]

Abbreviated code for loop (missing the actual list of valid IP addresses): 循环的缩写代码(缺少有效IP地址的实际列表):

info = {}

for url in detail_urls:

    proxy = ### SELECT RANDOM IP ADDRESS FROM A LIST OF VALID IP ADDRESSES ###

    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--proxy-server='+str(proxy))
    driver = webdriver.Chrome(executable_path='/PATH/chromedriver', options=chrome_options)
    driver.get(url)
    driver.implicitly_wait(3)

    if drive.find_element_by_xpath('//*[@id="ngVewDiv"]/div/div/div/div[3]/div[3]/div[2]/div/search-result/div/div[4]/div[1]/more-info/div[1]/button'):
        button = driver.find_element_by_xpath('//*[@id="ngVewDiv"]/div/div/div/div[3]/div[3]/div[2]/div/search-result/div/div[4]/div[1]/more-info/div[1]/button').click() 
        name = driver.find_element_by_xpath('//*[@id="ngVewDiv"]/div/div/div/div[3]/div[3]/div[2]/div/search-result/div/div[1]/div/div[1]/span[1]/span[1]/div/div/div[2]/span')
        name = name.text
        offenderid = driver.find_element_by_xpath('//*[@id="ngVewDiv"]/div/div/div/div[3]/div[3]/div[2]/div/search-result/div/div[4]/div[1]/more-info/div[2]/div/div/div[2]/div[1]/div/div[2]/span')
        offenderid = offenderid.text
        info[name] = [offenderid]
        driver.close()
    else:
        driver.close()

Is there a way to generate a list of valid random IP addresses, select one randomly within a loop and feed it to the Selenium WebDriver to avoid being blocked? 有没有一种方法可以生成有效的随机IP地址列表,可以在循环中随机选择一个,然后将其提供给Selenium WebDriver以避免被阻止?

To get a random item from a sequence, use random.choice(seq) from the random module. 要从序列中获取随机项,请使用random模块中的random.choice(seq)

see: https://docs.python.org/3/library/random.html#random.choice 请参阅: https//docs.python.org/3/library/random.html#random.choice

example: 例:

import random

proxies = ['10.0.1.1', '10.0.1.2', '10.0.1.3']
proxy = random.choice(proxies)

Note: Your question sort of doesn't make sense, because you stated that you want to generate list of valid IP addresses. 注意:您的问题排序没有意义,因为您表示要生成有效 IP地址列表。 You can't just generate random IP's and expect them to work... you must actually provide the valid IP's to your script. 您不能只生成随机IP并期望它们起作用……您实际上必须向脚本提供有效IP。 You will need the server infrastructure that provides this (ie a pool of working proxy servers bound to each address in your list) because requests will then be routed through these servers. 您将需要提供此功能的服务器基础结构(即,绑定到列表中每个地址的工作代理服务器池),因为请求将随后通过这些服务器进行路由。 If you are just trying to spoof your IP and don't have a pool of servers to proxy through, the answer is "No, that won't work." 如果您只是在欺骗IP而没有可供代理使用的服务器池,答案是“不,那将不起作用”。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM