简体   繁体   English

Python - 具有多个 arrays 的多线程将 args 传递给 function

[英]Python - multithread with multiple arrays passing args to function

I'm trying to implement multithreading to a very time consuming program, and I've come across this SO answer: https://stackoverflow.com/a/28463266/3451339 , which basically offers this solution for multiple arrays:我正在尝试对一个非常耗时的程序实现多线程,我遇到了这个 SO 答案: https://stackoverflow.com/a/28463266/3451339 ,它基本上为多个 arrays 提供了这个解决方案:

from multiprocessing.dummy import Pool as ThreadPool

pool = ThreadPool(4)
results = pool.map(my_function, my_array)

# Close the pool and wait for the work to finish
pool.close()
pool.join()

and, passing multiple arrays:并且,通过多个 arrays:

results = pool.starmap(function, zip(list_a, list_b))

The following is the code I have so far which must be refactored with threading.以下是我到目前为止必须使用线程重构的代码。 It iterates over 4 arrays, and needs to pass arguments to the function at each iteration and append all results to a final container:它迭代了 4 个 arrays,并且需要在每次迭代时将 arguments 传递给 function 并将 Z9516ZDFB46F58C17A94F14Z 和 Z9516ZDFB46F51C7EE1 最终结果

    strategies = ['strategy_1', 'strategy_2']
    budgets = [90,100,110,120,130,140,150,160]
    formations=['343','352','433','442','451','532','541']
    models = ['model_1', 'model_2', 'model_3']

    all_teams = pd.DataFrame()

    for strategy in strategies:
        for budget in budgets:
            for formation in formations:
                for model in models:

                    team = function(strategy=strategy, 
                                    budget=budget, 
                                    curr_formation=formation,
                                    model=model)
                       
                    all_teams = all_teams.append(team, ignore_index=True, sort=False)\
                                         .reset_index(drop=True)\
                                         .copy()

Note : Each function call makes api web requests.注意:每个 function 调用都会发出 api web 请求。

What is the way to go with multithreading in this scenario?在这种情况下,多线程 go 的方法是什么?

Python has the multiprocessing module which can run multiple tasks in parallel and inside each process you can have multiple threads or async io code Python 具有multiprocessing module ,可以并行运行多个任务,并且在每个进程内您可以有多个线程或async io代码

Here is a working example which uses 3 Processes and Multithreading这是一个使用 3 个进程和多线程的工作示例

import pandas as pd
import multiprocessing
from multiprocessing import Queue
from threading import Thread

strategies = ['strategy_1', 'strategy_2']
budgets = [90,100,110,120,130,140,150,160]
formations=['343','352','433','442','451','532','541']
models = ['model_1', 'model_2', 'model_3']

 #shared Queue if you want to reduce write locking use 3 Queues
Q = Queue()

# Retrive async if you want to speed up the process
def function(q,strategy,budget,curr_formation,model):
    q.put("Team")

def runTask(model,q):
    for strategy in strategies:
        for budget in budgets:
            for formation in formations:
                Thread(target=function,args=(q,strategy,budget,formation,model)).start()

def main():
    p1 = multiprocessing.Process(target=runTask, args=('model_1',Q))
    p2 = multiprocessing.Process(target=runTask, args=('model_2',Q))
    p3 = multiprocessing.Process(target=runTask, args=('model_3',Q))

    p1.start()
    p2.start()
    p3.start()

    p1.join()
    p2.join()
    p3.join()

    all = []
    for i in range(0,Q.qsize()):
        all.append(Q.get())
    print(all)
    print(len(all))

if __name__ == "__main__": 
    main()

A usefull article Multiprocessing in Python |一篇有用的文章Python 中的多处理 | Set 2 设置 2

This can be one approach.这可以是一种方法。

Note: Thread vs multiProcess .注意: Thread 与 multiProcess In this SO, I have provided execution through map, that will not work here as map has limitation on number.在这个 SO 中,我通过 map 提供了执行,这在此处不起作用,因为 map 对数量有限制。

  1. Run your nested for loops and build a list of parameters ==> financial_options运行嵌套的 for 循环并构建参数列表 ==> Financial_options
    for strategy in strategies:
        for budget in budgets:
            for formation in formations:
                for model in models:
                    financial_options.append([strategy,budget,formation,model])
    financial_options_len=len(financial_options)
  1. Create a new function that will handle API calls创建一个新的 function 来处理 API 调用
def access_url(url,parameter_list):
    #response=requests.get(url) # request goes here
    print(parameter_list)
    time.sleep(2)
    print("sleep done!")
    return "Hello"#,parameter_list # return type

now run the threading with these permutation parameters.现在使用这些排列参数运行线程。 so complete program will look like this:所以完整的程序将如下所示:

import concurrent.futures
import requests # just in case needed
from bs4 import BeautifulSoup # just in case needed
import time
import pandas as pd

def access_url(url,parameter_list):
    #response=requests.get(url) # request goes here
    print(parameter_list)
    time.sleep(2)
    print("sleep done!")
    return "Hello"#,parameter_list # return type

def multi_threading():
    test_url="http://bla bla.com/"
    base_url=test_url
    THREAD_MULTI_PROCESSING= True
    
    
    strategies = ['strategy_1', 'strategy_2']
    budgets = [90,100,110,120,130,140,150,160]
    formations=['343','352','433','442','451','532','541']
    models = ['model_1', 'model_2', 'model_3']

    all_teams = pd.DataFrame()
    start = time.perf_counter() # start time for performance
    financial_options=[]
    decision_results=[]
    for strategy in strategies:
        for budget in budgets:
            for formation in formations:
                for model in models:
                    financial_options.append([strategy,budget,formation,model])
    financial_options_len=len(financial_options)
    print(f"Total options:{financial_options_len}")
    future_list = []
    THREAD_MULTI_PROCESSING_LOOP=True
    if THREAD_MULTI_PROCESSING_LOOP:
        with concurrent.futures.ThreadPoolExecutor() as executor: # Through executor
            for each in range(financial_options_len):
                future = executor.submit(access_url,test_url,financial_options[each]) # submit each option
                future_list.append(future)    
        for f1 in concurrent.futures.as_completed(future_list):
            r1=f1.result()
            decision_results.append(r1)
        
    end = time.perf_counter() # finish time for performance
    print(f'Threads: Finished in {round(end - start,2)} second(s)') 
    df=pd.DataFrame(decision_results)
    df.to_csv("multithread_for.csv")
    return df,decision_results
df,results=multi_threading()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM