简体   繁体   English

Python 3.x,需要帮助逐行遍历代理文本文件

[英]Python 3.x, need help iterating through a proxy text file line by line

I'm relatively new to python, and I am trying to build a program that can visit a website using a proxy from a list of proxies in a text file, and continue doing so with each proxy in the file until they're all used. 我是python的新手,我正在尝试构建一个程序,该程序可以使用代理从文本文件中的代理列表中使用代理访问网站,并继续对文件中的每个代理进行操作,直到全部使用完毕。 I found some code online and tweaked it to my needs, but when I run the program, the proxies are successfully used, but they don't get used in order. 我在网上找到了一些代码,并根据自己的需要进行了调整,但是当我运行该程序时,代理已被成功使用,但是却没有被按顺序使用。 For whatever reason, the first proxy gets used twice in a row, then the second proxy gets used, then the first again, then third, blah blah. 无论出于何种原因,第一个代理都将连续使用两次,然后使用第二个代理,然后再使用第一个,然后再使用第三个,等等。 It doesn't go in order one by one. 它不会一一列出。

The proxies in the text file are organized as such: 文本文件中的代理组织如下:

123.45.67.89:8080 123.45.67.89:8080
987.65.43.21:8080 987.65.43.21:8080

And so on. 等等。 Here's the code I am using: 这是我正在使用的代码:

from fake_useragent import UserAgent
import pyautogui
import webbrowser
import time
import random
import random
import requests
from selenium import webdriver
import os
import re

proxylisttext = 'proxylistlist.txt'
useragent = UserAgent()
profile = webdriver.FirefoxProfile()
profile.set_preference("network.proxy.type", 1)
profile.set_preference("network.proxy_type", 1)

def Visiter(proxy1):
    try:
        proxy = proxy1.split(":")
        print ('Visit using proxy :',proxy1)
        profile.set_preference("network.proxy.http", proxy[0])
        profile.set_preference("network.proxy.http_port", int(proxy[1]))
        profile.set_preference("network.proxy.ssl", proxy[0])
        profile.set_preference("network.proxy.ssl_port", int(proxy[1]))
        profile.set_preference("general.useragent.override", useragent.random)
        driver = webdriver.Firefox(firefox_profile=profile)
        driver.get('https://www.iplocation.net/find-ip-address')
        time.sleep(2)
        driver.close()
    except:
        print('Proxy failed')
        pass

def loadproxy():
    try:
        get_file = open(proxylisttext, "r+")
        proxylist = get_file.readlines()
        writeused = get_file.write('used')
        count = 0
        proxy = []
        while count < 10:
            proxy.append(proxylist[count].strip())
            count += 1
            for i in proxy:
                Visiter(i)
    except IOError:
        print ("\n[-] Error: Check your proxylist path\n")
        sys.exit(1)

def main():
    loadproxy()
if __name__ == '__main__':
    main()

And so as I said, this code successfully navigates to the ipchecker site using the proxy, but then it doesn't go line by line in order, the same proxy will get used multiple times. 就像我说的那样,此代码使用代理成功导航到ipchecker站点,但随后没有逐行进行,同一代理将被多次使用。 So I guess more specifically, how can I ensure the program iterates through the proxies one by one, without repeating? 因此,我想更具体地讲,如何确保该程序一个接一个地遍历代理,而无需重复? I have searched exhaustively for a solution, but I haven't been able to find one, so any help would be appreciated. 我已经详尽搜索了一种解决方案,但找不到任何解决方案,因此将不胜感激。 Thank you. 谢谢。

Your problem is with these nested loops, which don't appear to be doing what you want: 您的问题出在这些嵌套循环上,它们似乎并没有按照您想要的方式工作:

    proxy = []
    while count < 10:
        proxy.append(proxylist[count].strip())
        count += 1
        for i in proxy:
            Visiter(i)

The outer loop builds up the proxy list, adding one value each time until there are ten. 外循环建立proxy列表,每次添加一个值,直到有十个。 After each value has been added, the inner loop iterates over the proxy list that has been built so far, visiting each item. 在添加每个值之后,内部循环将循环访问到目前为止已构建的proxy列表,并访问每个项目。

I suspect you want to unnest the loops. 我怀疑您想取消循环。 That way, the for loop will only run after the while loop has completed, and so it will only visit each proxy once. 这样, for循环将仅在while循环完成后运行,因此它将仅访问每个代理一次。 Try something like this: 尝试这样的事情:

    proxy = []
    while count < 10:
        proxy.append(proxylist[count].strip())
        count += 1
    for i in proxy:
        Visiter(i)

You could simplify that into a single loop, if you want. 如果需要,可以将其简化为一个循环。 For instance, using itertools.islice to handle the bounds checking, you could do: 例如,使用itertools.islice处理边界检查,您可以执行以下操作:

for proxy in itertools.islice(proxylist, 10):
    Visiter(proxy.strip())

You could even run that directly on the file object (since files are iterable) rather than calling readlines first, to read it into a list. 您甚至可以直接在文件对象上运行它(因为文件是可迭代的),而不是先调用readlines将其读入列表。 (You might then need to add a seek call on the file before writing "used" , but you may need that anyway, some OSs don't allow you to mix reads and writes without seeking in between.) (然后,您可能需要在写"used"之前在文件上添加一个seek调用,但是无论如何,您可能需要这样做,某些操作系统不允许您混合读写,而不能在两者之间进行查找。)

while count < 10: proxy.append(proxylist[count].strip()) count += 1 for i in proxy: Visiter(i)

The for loop within the while loop means that every time you hit proxy.append you'll call Visiter for every item already in proxy. while循环中的for循环意味着每次您点击proxy.append时,您都将为代理中已有的每个项目调用Visiter。 That might explain why you're getting multiple hits per proxy. 这也许可以解释为什么您每个代理都获得多次点击。

As far as the out of order issue, I'm not sure why readlines() isn't maintaining the line order of your file but I'd try something like: 至于乱序问题,我不确定为什么readlines()不能保持文件的行顺序,但是我会尝试类似的方法:

with open('filepath', 'r') as file: for line in file: do_stuff_with_line(line) With the above you don't need to hold the whole file in memory at once either which ca be nice for big files. with open('filepath', 'r') as file: for line in file: do_stuff_with_line(line)使用上述方法,您不需要一次将整个文件都保存在内存中,这对于大文件来说可能是不错的选择。

Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM