有没有办法替换 Pytorch 中用于 DDP(DistributedDataParallel) 的“allreduce_hook”？

Question

I know that Pytorch DDP uses 'allreduce_hook' as the default communication hook.我知道 Pytorch DDP 使用 'allreduce_hook' 作为默认通信挂钩。 Is there a way to replace this default hook with 'quantization_pertensor_hook' or 'powerSGD_hook'.有没有办法用“quantization_pertensor_hook”或“powerSGD_hook”替换这个默认挂钩。 There is an official Pytorch documentation introducing the DDP communication hooks, but I still got confused about how to do this in practice.有一个官方的Pytorch 文档介绍了 DDP 通信钩子，但我仍然对如何在实践中做到这一点感到困惑。

This is how I initiate the process group and create the DDP model.这就是我启动进程组并创建 DDP model 的方式。

import torch.distributed as dist
import torch.nn as nn

dist.init_process_group(backend='nccl', init_method='env://', world_size=args.world_size, rank=rank)
model = nn.parallel.DistributedDataParallel(model, device_ids=[0])

Is there any way to declare the hook that I want based on this code?有没有办法根据这段代码声明我想要的钩子？

Answer 1

This could do the job这可以完成这项工作


dist.init_process_group(backend='nccl', init_method='env://', world_size=args.world_size, rank=rank)
model = nn.parallel.DistributedDataParallel(model, device_ids=[0])

state = powerSGD.PowerSGDState(process_group=None, matrix_approximation_rank=1, start_powerSGD_iter=10, min_compression_rate=0.5)
model.register_comm_hook(state, powerSGD.powerSGD_hook)
...

有没有办法替换 Pytorch 中用于 DDP(DistributedDataParallel) 的“allreduce_hook”？

问题描述

1 个解决方案

解决方案1
0 2021-12-20 09:15:50

有没有办法替换 Pytorch 中用于 DDP(DistributedDataParallel) 的“allreduce_hook”？

问题描述

1 个解决方案

解决方案1 0 2021-12-20 09:15:50

解决方案1
0 2021-12-20 09:15:50