简体   繁体   English

Python该脚本可以是多线程的吗?

[英]Python Can this script be multithreaded?

It would be great if someone could help me with multi-threading this script and write the output to a text file. 如果有人可以帮助我对该脚本进行多线程处理并将输出写入文本文件,那就太好了。 I am really new at coding, so please help me out. 我真的是编码新手,所以请帮帮我。

#!/usr/bin/python

from tornado import ioloop, httpclient
from BeautifulSoup import BeautifulSoup
from mechanize import Browser
import requests
import urllib2
import socket
import sys

def handle_request(response):
    print response.code

global i

i = 0
i -= 1
if i == 0:
    http_client = httpclient.AsyncHTTPClient()
for url in open('urls.txt'):
    try:
        br = Browser()
        br.set_handle_robots(False)
        res = br.open(url, None, 2.5)
        data = res.get_data()
        soup = BeautifulSoup(data)
        title = soup.find('title')
        if soup.title != None:
            print url, title.renderContents(), '\n'
        i += 1
    except urllib2.URLError, e:
        print "Oops, timed out?", '\n'
    except socket.error,e:
        print "Oops, timed out?", '\n'
    except socket.timeout:
        print "Oops, timed out?", '\n'
print 'Processing of list completed, Cheers!!'
sys.exit()
try:
    ioloop.IOLoop.instance().start()
except KeyboardInterrupt:
    ioloop.IOLoop.instance().stop()

I am trying to grep the HTTP title of a list of hosts. 我正在尝试grep主机列表的HTTP标题。

The basic idea you have already implemented is an non-blocking HTTP client. 您已经实现的基本思想是一个非阻塞HTTP客户端。

def handle_request(response):
    if response.error:
        print "Error:", response.error
    else:
        print response.body

for url in ["http://google.com", "http://twitter.com"]:
    http_client = httpclient.AsyncHTTPClient()
    http_client.fetch(url, handle_request)

You could loop over your urls and the callback will be called as soon the respone for a specific url becomes availible. 您可以遍历您的网址,并在特定网址的响应可用后立即调用回调。

I wouldn't mix up mechanize, ioloop,... if not necessary. 我不会混淆机械化,ioloop,...如果没有必要的话。


Apart from that, I recommend grequests . 除此之外,我建议使用grequests It is a lightweight tool which satisfies your requirements. 它是一款轻巧的工具,可以满足您的要求。

import grequests
from bs4 import BeautifulSoup

urls = ['http://google.com', 'http://www.python.org/']

rs = (grequests.get(u) for u in urls)
res = grequests.map(rs)

for r in res:
    soup = BeautifulSoup(r.text)
    print "%s: %s" % (r.url, soup.title.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM