簡體   English   中英

Python中的多處理同時執行不同的功能

[英]Simultaneous execution of different functions with multiprocessing in Python

我有一個 class 主要更新數據並對其進行分析,每 30 秒連續一次。 關鍵是,由於您在偽代碼中看到的函數必須保持獨立,因此我不得不下載所有數據然后對其進行分析。 我的目標是重寫代碼,以便一旦代碼的數據可用,function 立即解析它。 此工作流程在偽代碼 02 中進行了說明,但是否可以在不合並功能的情況下進行? 感謝您的建議:)

偽代碼01:

  import datetime, multiprocessing
    class myclass:
        def __init__(self, list_of_symbols):
            self.last_update = 0
            self.list_of_symbols = list_of_symbols
            
        def get_data(self, symbol):
            request_data_to_api(symbol)
            
        def analyze_data(self, symbol):
            analyze(symbol)
            
        def run():
            while True:
                if (self.last_update == 0) or ((datetime.datetime.now() - self.last_update).seconds >= 30):
                    self.last_update = datetime.datetime.now()
    
                    # Update data as soon as possible
                    pool = multiprocessing.Pool(20)
                    pool.map(self.get_data, self.list_of_symbols)
                    pool.close()
                    pool.join()
                    
                    # Analyze data as soon as possible
                    pool = multiprocessing.Pool(20)
                    pool.map(self.analyze_data, self.list_of_symbols)
                    pool.close()
                    pool.join()

偽代碼 02:

class myclass:
    def __init__(self, list_of_symbols):
        self.last_update = 0
        self.list_of_symbols = list_of_symbols
        
    def get_all_in_one(self, symbol):
        request_data_to_api(symbol)
        analyze(symbol)      
        
    def run():
        while True:
            if (self.last_update == 0) or ((datetime.datetime.now() - self.last_update).seconds >= 30):
                self.last_update = datetime.datetime.now()
                
                pool = multiprocessing.Pool(20)
                pool.map(self.get_all_in_one, self.list_of_symbols)
                pool.close()
                pool.join()

我正在假設方法get_data是一個 I/O 綁定任務,多線程更適合(並且可以從更大的線程池中受益),而analyze_data是 CPU 密集型的,它應該使用多處理並且應該限於您擁有的 CPU 內核數。 我還在他們的簽名和返回的內容中修改了你的方法,我認為這是必要的。

想法是(1)在循環外創建一次池,因為創建池可能很昂貴;(2)使用imap_unordered (使用適當的chunksize參數來提高效率),以便從get_data返回的結果可以盡快提交給analyze_data當它們可用時:

import datetime
from multiprocessing import Pool, cpu_count
from multiprocessing.pool import ThreadPool

class myclass:
    def __init__(self, list_of_symbols):
        self.last_update = 0
        self.list_of_symbols = list_of_symbols
        
    def get_data(self, symbol):
        data = request_data_to_api(symbol)
        return symbol, data
        
    def analyze_data(self, symbol, data):
        return analyze(symbol, data)
        
    def run(self): # added missing self argument

        def compute_chunksize(iterable_size, pool_size):
            chunksize, remainder = divmod(iterable_size, 4 * pool_size)
            if remainder:
                chunksize += 1
            return chunksize

        iterable_size = len(self.list_of_symbols)
        OPTIMAL_THREADPOOL_SIZE = 50 # Your guess is as good as mine
        threadpool_size = min(iterable_size, OPTIMAL_THREADPOOL_SIZE)
        thread_pool = ThreadPool(threadpool_size)
        processpool_size = min(iterable_size, cpu_count())
        process_pool = Pool(processpool_size) # use all the CPU cores that are available
        chunksize = compute_chunksize(iterable_size, processpool_size)

        while True:
            if (self.last_update == 0) or ((datetime.datetime.now() - self.last_update).seconds >= 30):
                self.last_update = datetime.datetime.now()

                # Update data as soon as possible
                results = thread_pool.imap_unordered(self.get_data, self.list_of_symbols, chunksize)
                # as results become available:
                async_results = []
                for symbol, data in results:
                    async_results.append(process_pool.apply_async(self.analyze_data, args=(symbol, data)))
                for async_result in async_results:
                    result = async_result.get() # return value from analyze_data

筆記

如果get_data也是 CPU 密集型的,只需創建多處理池並將thread_pool.imap_unordered(etc.替換為process_pool.imap_unordered(etc.

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM