Selenium Threads：如何使用代理（python）运行多线程浏览器

Question

I'm writing a script to access a website using proxies with multiple threads but now I'm stuck in multiple threads, when I run the script below, it opens 5 browsers but all 5 use 1 proxy, I want 5 browsers to use different proxies, can someone help me complete it?我正在编写一个脚本来使用多线程代理访问网站，但现在我卡在多线程中，当我运行下面的脚本时，它打开了 5 个浏览器，但所有 5 个浏览器都使用 1 个代理，我希望 5 个浏览器使用不同的代理，有人可以帮我完成吗？ thank you谢谢你

Here is my script :这是我的脚本：

from selenium import webdriver
from selenium import webdriver
import time , random
import threading


def e():

    a = open("sock2.txt", "r")
    for line in a.readlines():

        b = line
        prox = b.split(":")
        IP = prox[0]
        PORT = int(prox[1].strip("\n"))
        print(IP)
        print(PORT)


        profile = webdriver.FirefoxProfile()
        profile.set_preference("network.proxy.type", 1)
        profile.set_preference("network.proxy.socks", IP)
        profile.set_preference("network.proxy.socks_port", PORT)
        try:

            driver = webdriver.Firefox(firefox_profile=profile)
            driver.get("http://www.whatsmyip.org/")
        except:
            print("Proxy Connection Error")
            driver.quit()
        else:
            time.sleep(random.randint(40, 70))
            driver.quit()
for i in range(5):
    t = threading.Thread(target=e)
    t.start()

(Wish everyone has a happy and lucky new year) （祝大家新年快乐，万事如意）

Answer 1

( I personaly think that a problem is there that when you start a program, it will go to new thread, which will go throught the textfile from beginning, becasue you aint deleting them ) （我个人认为存在一个问题，当您启动程序时，它会转到新线程，该线程将从头开始遍历文本文件，因为您不会删除它们）

I have cane across the same problem, when I was doing the same thing as you do now.当我和你现在做同样的事情时，我也遇到了同样的问题。 I know you would rather want help with your code, but I am in hurry to test it and want to help you ;) , so here is a code that works for me ... There is even task killer for a chrome ( you just have to edit it to firefox )我知道您更希望获得有关您的代码的帮助，但我急于对其进行测试并希望为您提供帮助 ;)，所以这里有一个对我有用的代码......甚至还有一个 chrome 的任务杀手（你只是必须将其编辑为 Firefox ）

If I were you, I would start the thread after opening the file, cuz it looks liek you are opening the same file from 1st line everytime the tread starts如果我是你，我会在打开文件后启动线程，因为看起来你每次启动时都从第一行打开同一个文件

links = [ // Link you want to go to ]

def funk(xxx , website):
    link = website
    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('--proxy-server=%s' % str(xxx))
    chromedriver = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'chromedriver')
    chrome = webdriver.Chrome(chromedriver, chrome_options=chrome_options)
    try :
        // Do stuff
    except:
        print('exception')
    chrome.close()

for link in links:
    f = open('proxies.txt')
    line = f.readline()
    x = 1
    xx = 0
    while line:
        if number_of_used_proxies < 10:
            print(line)
            line = f.readline()
            try:
                threading.Timer(40, funk, [line, link]).start()
            except Exception as e:
                print(e)
            time.sleep(1)
            x += 1
            number_of_used_proxies += 1
        else:
            time.sleep(100)
            for x in range(1, 10):
                try:
                    xzxzx = 'os.system("taskkill /f /im chrome.exe")'
                    os.system("killall 'Google Chrome'")
                except:
                    print("NoMore")
            time.sleep(10)
            number_of_used_proxies = 0

    f.close()

Hope it helps :)希望能帮助到你：）

Answer 2

Dominik Lašo captured it correctly - each threads processes the file from the beginning. Dominik Lašo 正确捕获了它 - 每个线程从一开始就处理文件。 Here's probably how it should look like:这可能是它的样子：

from selenium import webdriver
from selenium import webdriver
import time , random
import threading


def e(ip, port):
    profile = webdriver.FirefoxProfile()
    profile.set_preference("network.proxy.type", 1)
    profile.set_preference("network.proxy.socks", IP)
    profile.set_preference("network.proxy.socks_port", PORT)
    try:
        driver = webdriver.Firefox(firefox_profile=profile)
        driver.get("http://www.whatsmyip.org/")
    except:
        print("Proxy Connection Error")
        driver.quit()
    else:
        time.sleep(random.randint(40, 70))
        driver.quit()

my_threads = []
with open("sock2.txt", "r") as fd:
    for line in fd.readlines():
        line = line.strip()
        if not line:
           continue
        prox = line.split(":")
        ip = prox[0]
        port = int(prox[1])
        print('-> {}:{}'.format(ip, port))
        t = threading.Thread(target=e, args=(ip, port,))
        t.start()
        my_threads.append(t)

for t in my_threads:
    t.join()

Answer 3

vantuong : Here's how you can solve the problem with ThreadPoolExecutor. vantuong ：这是您如何使用 ThreadPoolExecutor 解决问题的方法。

Reference : https://docs.python.org/3/library/concurrent.futures.html参考： https : //docs.python.org/3/library/concurrent.futures.html

from selenium import webdriver
import time, random
#import threading
import concurrent.futures

MAX_WORKERS = 5

def get_proxys(data_file):
    proxys = []
    with open(data_file, "r") as fd:
        for line in fd.readlines():
            line = line.strip()
            if not line:
               continue
            prox = line.split(":")
            ip = prox[0]
            port = int(prox[1])
            proxys.append((ip, port))
    return proxys


def e(ip, port):
    profile = webdriver.FirefoxProfile()
    profile.set_preference("network.proxy.type", 1)
    profile.set_preference("network.proxy.socks", IP)
    profile.set_preference("network.proxy.socks_port", PORT)
    try:
        driver = webdriver.Firefox(firefox_profile=profile)
        driver.get("http://www.whatsmyip.org/")
    except:
        print("Proxy Connection Error")
        driver.quit()
    else:
        time.sleep(random.randint(40, 70))
        driver.quit()


with concurrent.futures.ThreadPoolExecutor(max_workers=MAX_WORKERS) as executor:
    proxys = get_proxys('sock2.txt')
    tasks = {executor.submit(e, proxy[0], proxy[1]): proxy for proxy in proxys}
    for task in concurrent.futures.as_completed(tasks):
        proxy = tasks[task]
        try:
            data = task.result()
        except Exception as exc:
            print('{} generated an exception: {}'.format(proxy, exc))
        else:
            print('{} completed successfully'.format(proxy))

Fun exercise: Try playing around with different values of MAX_WORKERS.有趣的练习：尝试使用不同的 MAX_WORKERS 值。

Selenium Threads：如何使用代理（python）运行多线程浏览器

问题描述

3 个解决方案

解决方案1
4 2018-12-30 10:35:04

If I were you, I would start the thread after opening the file, cuz it looks liek you are opening the same file from 1st line everytime the tread starts如果我是你，我会在打开文件后启动线程，因为看起来你每次启动时都从第一行打开同一个文件

解决方案2
4 已采纳 2018-12-30 14:35:56

解决方案3
2 2018-12-31 07:27:19

Selenium Threads：如何使用代理（python）运行多线程浏览器

问题描述

3 个解决方案

解决方案1 4 2018-12-30 10:35:04

If I were you, I would start the thread after opening the file, cuz it looks liek you are opening the same file from 1st line everytime the tread starts如果我是你，我会在打开文件后启动线程，因为看起来你每次启动时都从第一行打开同一个文件

解决方案2 4 已采纳 2018-12-30 14:35:56

解决方案3 2 2018-12-31 07:27:19

解决方案1
4 2018-12-30 10:35:04

解决方案2
4 已采纳 2018-12-30 14:35:56

解决方案3
2 2018-12-31 07:27:19