python中如何將導入的模塊封裝成多線程的方法？

Question

我是 python 的新手，在使用導入庫的內部函數時遇到了並發問題。 問題是我的代碼計算了不同類型的變量，並在最后一個過程中將它們保存到不同的文件中。 但是我在閱讀和寫作時遇到了同樣的問題。

這是一個有效的示例代碼，因為它是線性的：

import xarray as xr

def read_concurrent_files(self):

    files_var_type1 = get_files('type1','20200101','20200127')
    files_var_type2 = get_files('type2','20200101','20200127')
    files_var_type3 = get_files('type3','20200101','20200127')

def get_files(self, varType, dateini, datefin):

    # This methods return an array of file paths
    files = self.get_file_list(varType, dateini, datefin)
    files_raw = xr.open_mfdataset(files , engine='cfgrib', \
        combine='nested', concat_dim ='time', decode_coords = False, parallel = True)      
    return files_raw

但是，當我對代碼進行這些更改以使其並發時，它會失敗：

import xarray as xr
from multiprocessing.pool import ThreadPool

def read_concurrent_files(self):

    pool = ThreadPool(processes=3)

    async_result1 = pool.apply_async(self.get_files, ('type1','20200101','20200127',))
    async_result2 = pool.apply_async(self.get_files, ('type2','20200101','20200127',))
    async_result3 = pool.apply_async(self.get_files, ('type3','20200101','20200127',))

    files_var_type1 = async_result1.get()
    files_var_type2 = async_result2.get()
    files_var_type3 = async_result3.get()

def get_files(self, varType, dateini, datefin):

    # This methods return an array of file paths
    files = self.get_file_list(varType, dateini, datefin)
    files_raw = xr.open_mfdataset(files , engine='cfgrib', \
        combine='nested', concat_dim ='time', decode_coords = False, parallel = True)      
    return files_raw

問題出在不是 ThreadSafe 的xr.open_mfdataset調用中（或者我認為是這樣）。

有沒有辦法僅將導入庫封裝到方法 scope 中？

我來自其他語言，這很容易在方法中創建實例或使用 ThreadSafe 對象。

非常感謝提前！！

Answer 1

由於我是 python 的新手，所以我不知道我們可以創建不同類型的線程，所以在上面的示例中，我使用了可以被 GIL（全局解釋器鎖）鎖定的線程池，所以要避免它在那里是我們可以使用的另一種線程，這里是一個例子：

import os
import concurrent.futures

def get_xarray(self):
    tasks = []
    cpu_count = os.cpu_count()
    with concurrent.futures.ProcessPoolExecutor(max_workers = cpu_count) as executor:
        for i in range(0, len(self.files)):
            tasks.append(executor.submit(self.get_xarray_by_file, self.files[i]))

    results = []
    for result in tasks:
        results.append(result.result())
    era_raw = xr.merge(results, compat='override')

    return era_raw.persist().load()

def get_xarray_by_file(self, files):
    era_raw = xr.open_mfdataset(files , engine='cfgrib', \
        combine='nested', concat_dim ='time', decode_coords = False, parallel = True)
    return era_raw.persist().load()

在這種情況下，我們使用ProcessPoolExecutor ：

ProcessPoolExecutor class 是一個Executor子類，它使用進程池來異步執行調用。 ProcessPoolExecutor使用多處理模塊，這允許它繞過全局解釋器鎖，但也意味着只能執行和返回可拾取的對象。

現在我們可以並行讀取 grib2 文件，或者從真正並行的 dataframe 中創建 nc 或 csv 文件。

python中如何將導入的模塊封裝成多線程的方法？

問題描述

1 個解決方案

解決方案1
0 已采納 2021-02-10 07:57:35

python中如何將導入的模塊封裝成多線程的方法？

問題描述

1 個解決方案

解決方案1 0 已采納 2021-02-10 07:57:35

解決方案1
0 已采納 2021-02-10 07:57:35