简体   繁体   English

跨多个进程训练模型时,在 PyTorch 中使用 tensor.share_memory_() 与 multiprocessing.Queue

[英]Using tensor.share_memory_() vs multiprocessing.Queue in PyTorch when training model across multiple processes

I'm using the multiprocessing package in pytorch to split the training across multiple processes.我在 pytorch 中使用 multiprocessing 包将训练拆分到多个进程。 My x and y, train and test data are CUDA tensors.我的 x 和 y、训练和测试数据是 CUDA 张量。 I'm trying to understand the difference between using the tensor.share_memory_() and the multiprocessing.Queue method to share cuda tensors.我试图了解使用 tensor.share_memory_() 和 multiprocessing.Queue 方法共享 cuda 张量之间的区别。 Which is preferred and why?哪个是首选,为什么?

Here's my current code using tensor.share_memory_().这是我当前使用 tensor.share_memory_() 的代码。 What changes should I make?我应该做哪些改变?

def train(model, features, target, epochs=1000):

    X_train, x_test, Y_train, y_test = train_test_split(features,
                                                    target,
                                                    test_size=0.4,
                                                    random_state=0)

    Xtrain_ = torch.from_numpy(X_train.values).float().share_memory_()
    Xtest_ = torch.from_numpy(x_test.values).float().share_memory_()

    Ytrain_ = (torch.from_numpy(Y_train.values).view(1,-1)[0]).share_memory_()
    Ytest_ = (torch.from_numpy(y_test.values).view(1,-1)[0]).share_memory_()


    optimizer = optim.Adam(model.parameters(), lr = 0.01)
    loss_fn = nn.NLLLoss()


    for epoch in range(epochs):

        #training code here

target method ends here目标方法到此结束

mp.set_start_method('spawn')

model = Net()
model.share_memory()

processes = []

for rank in range(1):
    p = mp.Process(target=train, args=(model, features, target))
    p.start()
    processes.append(p)

Env details: Python-3 and Linux环境细节:Python-3 和 Linux

They are the same.他们是一样的。 torch.multiprocessing.Queue uses tensor.share_memory_() internally. torch.multiprocessing.Queue内部使用tensor.share_memory_()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 跨衍生进程复制“multiprocessing.Queue” - Duplicating a `multiprocessing.Queue` across spawned processes 什么IPC机制用于在两个Python进程之间的multiprocessing.Queue中共享数据? - What IPC mechanism is used to share the data in multiprocessing.Queue between two Python processes? 分叉时的 multiprocessing.Queue 行为 - multiprocessing.Queue behavior when forking 多处理。当进程死亡时,挂起的值 - multiprocessing.Queue hanging when Process dies 何时使用 multiprocessing.Queue 而不是 multiprocessing.Pool? 什么时候需要使用 multiprocessing.Queue? - When to use multiprocessing.Queue over multiprocessing.Pool? When is there a need to use multiprocessing.Queue? 与多个从属进程通信(每个从属进程一个multiprocessing.Queue实例) - Communicate with multiple slave processes (one multiprocessing.Queue instance per slave) Python multiprocessing.Queue 与 multiprocessing.manager().Queue() - Python multiprocessing.Queue vs multiprocessing.manager().Queue() 不同的 multiprocessing.Queue 对象在不通信的情况下是否应该用于不同的进程? - Should different multiprocessing.Queue objects be used for different processes when they do not communicate? 使用multiprocessing.Queue()时记录传播 - Logging propagation while using multiprocessing.Queue() Python multiprocessing.Queue 未从分叉进程接收放置 - Python multiprocessing.Queue not receiving puts from forked processes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM