简体   繁体   English

urllib2 urlopen将在多处理中被阻止

[英]urllib2 urlopen will be block in multiprocessing

I want make use of multiprocessing to speed report generator for every company 我想利用多处理来为每个公司加快报告生成器的速度

following is test script: 以下是测试脚本:

from multiprocessing import Pool
import os, time, random, json, urllib, urllib2, uuid

def generate_report(url, cookie, company_id, period, remark):
    try:
        start = time.time()
        print('Run task %s (%s)... at: %s \n' % (company_id, os.getpid(), start))

        values = {
            'companies': json.dumps([company_id]),
            'month_year': period,
            'remark': remark
        }

        data = urllib.urlencode(values)

        headers = {
            'Cookie': cookie
        }
        url = "%s?pid=%s&uuid=%s" % (url, os.getpid(), uuid.uuid4().get_hex())
        request = urllib2.Request(url, data, headers)
        response = urllib2.urlopen(request)
        content = response.read()
        end = time.time()
        print 'Task %s runs %0.2f seconds, end at: %s \n' % (company_id, (end - start), end)
        return content
    except Exception as exc:
        return exc.message

if __name__=='__main__':
    print 'Parent process %s.\n' % os.getpid()
    p = Pool()

    url = 'http://localhost/fee_calculate/generate-single'
    cookie = 'xxx'
    company_ids = [17,15,21,19]
    period = '2017-08'
    remark = 'test add remark from python script'

    results = [p.apply_async(generate_report, args=(url,cookie,company_id,period,remark)) for company_id in company_ids]
    for r in results:
        print(r.get())

but I get the result as following: 但我得到的结果如下:

Run task 17 (15952)... at: 1506568581.98
Run task 15 (17192)... at: 1506568581.99
Run task 21 (18116)... at: 1506568582.01
Run task 19 (1708)... at: 1506568582.05

Task 17 runs 13.50 seconds, end at: 1506568595.48

{"success":true,"info":"Successed!"}
Task 15 runs 23.60 seconds, end at: 1506568605.59

{"success":true,"info":"Successed!"}
Task 21 runs 34.35 seconds, end at: 1506568616.36

{"success":true,"info":"Successed!"}
Task 19 runs 44.38 seconds, end at: 1506568626.44

{"success":true,"info":"Successed!"}

it seems the urllib2.urlopen(request) has been blocked, the request not been sent parallelly, but orderly. 似乎urllib2.urlopen(request)已被阻止,该请求不是并行发送的,而是有序发送的。

In order to test multiprocessing, the script fee_calculate/generate-single only has following important code: 为了测试多重处理,脚本fee_calculate / generate-single仅具有以下重要代码:

sleep(10)

please give me advice, thanks. 请给我建议,谢谢。

PS: Platform: windows10, python2.7, 4 CPU PS:平台:Windows10,python2.7,4 CPU

This isn't a multiprocessing issue. 这不是一个多处理问题。 Multiprocessing is working as it should which you can see by observing that all of the tasks are starting at approximately the same time. 通过观察所有任务几乎同时开始,您可以看到多处理正在应有的状态。

The task execution time is almost entirely dictated by the response time of your local endpoint at http://localhost/fee_calculate/generate-single . 任务执行时间几乎完全由您本地端点在http://localhost/fee_calculate/generate-single的响应时间决定。 How are you running this server? 您如何运行此服务器? If you observe the execution times for each of the reports you will notice that they are increasing in steps of ~10 seconds, which is your artificially imposed processing delay on the server side ( sleep(10) ). 如果观察每个报告的执行时间,您会注意到它们以大约10秒的步长递增,这是服务器端人为施加的处理延迟( sleep(10) )。

I suspect that your local server is only single-threaded, and so can only handle one request at a time. 我怀疑您的本地服务器仅是单线程的,因此一次只能处理一个请求。 This means that each request must be completed before the next one is processed, so when you make multiple concurrent requests like this you don't actually get any decrease in processing time. 这意味着每个请求都必须在处理下一个请求之前完成,因此当您发出多个并发请求时,实际上不会减少任何处理时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM