简体   繁体   English

Python 多处理不会提高性能

[英]Python multiprocessing not increasing performance

I wrote a simple python multiprocessing, in which it reads a bunch of lines from csv, calls an api and then writes to new csv.我写了一个简单的 python 多处理,它从 csv 读取一堆行,调用一个 api,然后写入新的 csv。 However, what I see is that performance of this program is same as sequential execution.但是,我看到的是该程序的性能与顺序执行相同。 Changing the pool size does not have any effect.更改池大小没有任何影响。 What is going wrong?出了什么问题?

from multiprocessing import Pool
from random import randint
from time import sleep
import csv
import requests
import json



def orders_v4(order_number):



    response = requests.request("GET", url, headers=headers, params=querystring, verify=False)

    return response.json()


newcsvFile=open('gom_acr_status.csv', 'w')
writer = csv.writer(newcsvFile)

def process_line(row):
    ol_key = row['\ufeffORDER_LINE_KEY']
    order_number=row['ORDER_NUMBER']
    orders_json = orders_v4(order_number)
    oms_order_key = orders_json['oms_order_key']

    order_lines = orders_json["order_lines"]
    for order_line in order_lines:
        if ol_key==order_line['order_line_key']:
            print(order_number)
            print(ol_key)
            ftype = order_line['fulfillment_spec']['fulfillment_type']
            status_desc = order_line['statuses'][0]['status_description']
            print(ftype)
            print(status_desc)
            listrow = [ol_key, order_number, ftype, status_desc]
            #(writer)
            writer.writerow(listrow)
            newcsvFile.flush()


def get_next_line():
    with open("gom_acr.csv", 'r') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            yield row


f = get_next_line()

t = Pool(processes=50)

for i in f:

    t.map(process_line, (i,))

t.join()
t.close()

EDIT: I just noticed you call map inside a loop.编辑:我刚刚注意到你在循环中调用map you need to call it only once.你只需要调用一次。 is is a blocking function, it is not async! is 是一个阻塞函数,它不是异步的! check out the docs for examples of correct usage.查看文档以获取正确使用示例。

A parallel equivalent of the map() built-in function (it supports only one iterable argument though). map() 内置函数的并行等效项(尽管它仅支持一个可迭代参数)。 It blocks until the result is ready.它阻塞直到结果准备好。

Original answer:原答案:

The fact that all processes write to the output file causes file-system contention.所有进程都写入输出文件这一事实会导致文件系统争用。

If your process_line function would just return the rows (eg as a list of strings), then the main processes would write all of those after map returned them all, then you should experience a performance boost.如果您的process_line函数只返回行(例如作为字符串列表),那么主进程将在map全部返回后写入所有这些行,那么您应该会体验到性能提升。

also, 2 notes:另外,2个注释:

  1. try different numbers of processes, starting from # of cores and going up.尝试不同数量的进程,从内核数量开始并增加。 maybe 50 is too much.也许50太多了。
  2. the work done in each process seems (to me, at first glance) pretty short, it is possible that the overhead of spawning new processes and orchestrating them is just too big to benefit the task at hand.在每个流程中完成的工作(对我来说,乍一看)似乎很短,产生新流程和编排它们的开销可能太大而无法使手头的任务受益。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM