简体   繁体   English

python中如何将导入的模块封装成多线程的方法?

[英]How to encapsulate an imported module into a method for multithreading in python?

I'm new in python and I have a concurrent problem when using internal functions of importing libraries.我是 python 的新手,在使用导入库的内部函数时遇到了并发问题。 The problem is that my code calculates different kinds of variables and in the last process they are saved into different files.问题是我的代码计算了不同类型的变量,并在最后一个过程中将它们保存到不同的文件中。 But I have the same problem when reading and writing.但是我在阅读和写作时遇到了同样的问题。

This is an example code that works because is linear:这是一个有效的示例代码,因为它是线性的:

import xarray as xr

def read_concurrent_files(self):

    files_var_type1 = get_files('type1','20200101','20200127')
    files_var_type2 = get_files('type2','20200101','20200127')
    files_var_type3 = get_files('type3','20200101','20200127')

def get_files(self, varType, dateini, datefin):

    # This methods return an array of file paths
    files = self.get_file_list(varType, dateini, datefin)
    files_raw = xr.open_mfdataset(files , engine='cfgrib', \
        combine='nested', concat_dim ='time', decode_coords = False, parallel = True)      
    return files_raw

But when I make these changes to the code to be concurrent it fails:但是,当我对代码进行这些更改以使其并发时,它会失败:

import xarray as xr
from multiprocessing.pool import ThreadPool

def read_concurrent_files(self):

    pool = ThreadPool(processes=3)

    async_result1 = pool.apply_async(self.get_files, ('type1','20200101','20200127',))
    async_result2 = pool.apply_async(self.get_files, ('type2','20200101','20200127',))
    async_result3 = pool.apply_async(self.get_files, ('type3','20200101','20200127',))

    files_var_type1 = async_result1.get()
    files_var_type2 = async_result2.get()
    files_var_type3 = async_result3.get()

def get_files(self, varType, dateini, datefin):

    # This methods return an array of file paths
    files = self.get_file_list(varType, dateini, datefin)
    files_raw = xr.open_mfdataset(files , engine='cfgrib', \
        combine='nested', concat_dim ='time', decode_coords = False, parallel = True)      
    return files_raw

The problem is in the xr.open_mfdataset call that is not ThreadSafe (or I think so).问题出在不是 ThreadSafe 的xr.open_mfdataset调用中(或者我认为是这样)。

Is there a way to encapsulate the import library into the method scope only?有没有办法仅将导入库封装到方法 scope 中?

I came from other languages and that was easy creating the instance into the method or using ThreadSafe objects.我来自其他语言,这很容易在方法中创建实例或使用 ThreadSafe 对象。

Thanks a lot in advance!!非常感谢提前!!

As I'm new in python I was unaware of the different kinds of threads that we can create, so in my example above, I was using the ThreadPool that can be locked by the GIL (Global Interpreter Lock), so to avoid it there is another kind of threads we can use, here an example:由于我是 python 的新手,所以我不知道我们可以创建不同类型的线程,所以在上面的示例中,我使用了可以被 GIL(全局解释器锁)锁定的线程池,所以要避免它在那里是我们可以使用的另一种线程,这里是一个例子:

import os
import concurrent.futures

def get_xarray(self):
    tasks = []
    cpu_count = os.cpu_count()
    with concurrent.futures.ProcessPoolExecutor(max_workers = cpu_count) as executor:
        for i in range(0, len(self.files)):
            tasks.append(executor.submit(self.get_xarray_by_file, self.files[i]))

    results = []
    for result in tasks:
        results.append(result.result())
    era_raw = xr.merge(results, compat='override')

    return era_raw.persist().load()

def get_xarray_by_file(self, files):
    era_raw = xr.open_mfdataset(files , engine='cfgrib', \
        combine='nested', concat_dim ='time', decode_coords = False, parallel = True)
    return era_raw.persist().load()

In that case, we use the ProcessPoolExecutor :在这种情况下,我们使用ProcessPoolExecutor

The ProcessPoolExecutor class is an Executor subclass that uses a pool of processes to execute calls asynchronously. ProcessPoolExecutor class 是一个Executor子类,它使用进程池来异步执行调用。 ProcessPoolExecutor uses the multiprocessing module, which allows it to side-step the Global Interpreter Lock but also means that only pickable objects can be executed and returned. ProcessPoolExecutor使用多处理模块,这允许它绕过全局解释器锁,但也意味着只能执行和返回可拾取的对象。

Now we can read in parallel grib2 files, or create nc or csv files from a dataframe in real parallel.现在我们可以并行读取 grib2 文件,或者从真正并行的 dataframe 中创建 nc 或 csv 文件。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM