简体   繁体   English

python multi processing with shared memory and pytorch data loader - RuntimeError:use CUDA with multiprocessing you must use the 'spawn' start method

[英]python multi processing with shared memory and pytorch data loader - RuntimeError:use CUDA with multiprocessing you must use the 'spawn' start method

I am trying to implement a program with a producer and a consumer classes.我正在尝试用生产者和消费者类实现一个程序。 The producer class reads the numpy array(an image) and puts it in a shared memory and the consumer class will read the numpy array data from the shared memory and apply a pytorch inference model on that. The producer class reads the numpy array(an image) and puts it in a shared memory and the consumer class will read the numpy array data from the shared memory and apply a pytorch inference model on that.

Below is the shared memory creation code snippet.下面是共享的 memory 创建代码片段。

 import multiprocessing as multi_processing
 def create_shared_memory(self):
     type_code = "I"
     size = int(np.prod(self.image_frame_shape))
     frame_lock = multi_processing.Lock()
        
     shared_memory_array = multi_processing.Array(typecode_or_type = type_code, size_or_initializer = size, lock = frame_lock)
     buffered_array = np.frombuffer(shared_memory_array.get_obj(), dtype = type_code).reshape(self.image_frame_shape)

     shared_memory_object_tuple = (shared_memory_array, buffered_array)
     return shared_memory_object_tuple

I have created a pytorch data loader with the below code snippet.我用下面的代码片段创建了一个 pytorch 数据加载器。

inference_data_loader = create_loader(
            InferCustomDataset(
                frame_list,
                self.validation_transforms,
                input_size = self.model_params['input_size'][1:]
            ),
            **self.model_params
        )

And the InferCustomDataset class is as below. InferCustomDataset class 如下所示。

class InferCustomDataset(torch.utils.data.Dataset):
    def __init__(self, imlist,  custom_transforms = None, input_size = (224, 224)):
        self._imlist = imlist
        self.transform = custom_transforms
        self.input_size = input_size

    def __getitem__(self, idx):
        img = Image.fromarray(self._imlist[idx]).convert('RGB')
        img = img.resize(self.input_size)

        if self.transform is not None:
            img = self.transform(img)
        return img, torch.tensor(-1, dtype=torch.long)

    def __len__(self):
        return len(self._imlist)

When i try to iterate through the data loader , i am getting the below error/exception.当我尝试遍历data loader时,我收到以下错误/异常。

For loop : for image_data, _ in inference_data_loader: For loopfor image_data, _ in inference_data_loader:

Process ConsumerVHP-2:
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/mvi/modules/vac/src/consumer_video_handler_process.py", line 183, in run
    classes = self.infer_on_frame_list(buffered_images_list)
  File "/home/ubuntu/mvi/modules/vac/src/consumer_video_handler_process.py", line 92, in infer_on_frame_list
    for image_data, _ in inference_data_loader:
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 363, in __iter__
    self._iterator = self._get_iterator()
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 314, in _get_iterator
    return _MultiProcessingDataLoaderIter(self)
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 939, in __init__
    torch.cuda.current_device(),
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/site-packages/torch/cuda/__init__.py", line 481, in current_device
    _lazy_init()
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/site-packages/torch/cuda/__init__.py", line 206, in _lazy_init
    raise RuntimeError(
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
Exception ignored in: <function _MultiProcessingDataLoaderIter.__del__ at 0x7f990c904a60>
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1358, in __del__
    self._shutdown_workers()
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1317, in _shutdown_workers
    self._mark_worker_as_unavailable(worker_id, shutdown=True)
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1258, in _mark_worker_as_unavailable
    assert self._workers_status[worker_id] or (self._persistent_workers and shutdown)
AttributeError: '_MultiProcessingDataLoaderIter' object has no attribute '_workers_status'

^CProcess PproducerVHProcess-1:
Traceback (most recent call last):
  File "/home/ubuntu/mvi/modules/vac/src/main.py", line 195, in <module>
    producer_reader_process.join()
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/multiprocessing/process.py", line 149, in join
    res = self._popen.wait(timeout)
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/multiprocessing/popen_fork.py", line 43, in wait
Traceback (most recent call last):
    return self.poll(os.WNOHANG if timeout == 0.0 else 0)
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/multiprocessing/popen_fork.py", line 27, in poll
  File "/home/ubuntu/anaconda3/envs/conda-pv-pytorch-2/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/home/ubuntu/mvi/modules/vac/src/producer_video_handler_process.py", line 48, in run
    ret = shared_memory_array.acquire()
KeyboardInterrupt
    pid, sts = os.waitpid(self.pid, flag)
KeyboardInterrupt

It is throwing the RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method它抛出RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method error. RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method错误。

In my main program, if i use, set_start_method('spawn') , the consumer code is just getting the numpy array with all zeros and looks like consumer process is not getting the image (numpy array) from the shared memory.在我的main中,如果我使用set_start_method('spawn') ,消费者代码只是获取全零的 numpy 数组,看起来消费者进程没有从共享的 memory 获取图像(numpy 数组)。

I also tried by setting "num_workers": 0 , but getting the below error.我也尝试通过设置"num_workers": 0 ,但得到以下错误。

ValueError: persistent_workers option needs num_workers > 0

Could you let me know how to get the numpy array (image) that was sent to shared memory by the producer and apply the pytorch inference in the consumer process.您能否让我知道如何获取由生产者发送到共享 memory 的 numpy 数组(图像),并在消费者进程中应用 pytorch 推理。

I also tried torch.multiprocessing module instead of python's multiprocessing module, but that also resulted in the same error.我还尝试torch.multiprocessing模块而不是 python 的multiprocessing模块,但这也导致了同样的错误。

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method

Appreciate your suggestion/help on how to fix this.感谢您对如何解决此问题的建议/帮助。 Thank You.谢谢你。

The following change fixed this issue.以下更改修复了此问题。

In my if __name__ == '__main__' , i was calling the create_shared_memory() method.在我的if __name__ == '__main__'中,我正在调用create_shared_memory()方法。

shared_mem_handler = SharedMemoryHandler()
shared_memory_object_tuple = shared_mem_handler.create_shared_memory()

I moved this code outside of this main method, and added just under my import statements.我将这段代码移到这个main方法之外,并在我的import语句下添加。 This helped to fix this.这有助于解决这个问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM