简体   繁体   English

批量HTTP状态请求

[英]Bulk HTTP Status requests

I don't have any coding knowledge. 我没有任何编码知识。 I need to run a script. 我需要运行一个脚本。 That must be able to fetch the http status codes of the sites. 那必须能够获取站点的http状态代码。 Output must be provided like 输出必须像

domain.com 301 domain.com 200 domain.com 301 domain.com 200

I need to check huge list of sites like 200k urls. 我需要检查大量网站,例如200k网址。 So, It must be faster at the same time. 因此,它必须同时更快。 I got proxies to run it multi-threaded. 我有代理运行它多线程。

Any help/idea is highly appreciated! 任何帮助/想法都受到高度赞赏!

Below is a threaded and serial approach. 下面是一种线程化和串行方法。 I have not tested the limit of concurrent threads that it can support so you may want to implement some code to limit this. 我尚未测试它可以支持的并发线程的限制,因此您可能需要实现一些代码来限制此限制。

from threading import Thread
import urllib3
urllib3.disable_warnings(urllib3.exceptions.InsecureRequestWarning)

class Site (Thread):

    def __init__(self, thissite):
        Thread.__init__(self)
        self.pool = urllib3.PoolManager()
        self.site = thissite
        print('Started Thread for', self.site)

    def run(self):
        try:
            r = self.pool.request('GET', self.site)
            print('Thread Result', self.site, r.status)
        except:
            print('Thread Result', self.site, '404')

sitelist = []
f = open('D:\\Downloads\\SiteList.txt', 'r')
for x in f:
    print('[' + x.strip() + ']')
    sitelist.append(x.strip())

http = urllib3.PoolManager()

for site in sitelist:
    Check = Site(site)
    Check.start()

for site in sitelist:
    try:
        r = http.request('GET', site)
        print('Serial Result', site, r.status)
    except:
        print('Serial Result', site, '404')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM