简体   繁体   English

鉴于 memory,tensor2tensor 和 pytorch 有什么区别吗?

[英]Is there any difference between tensor2tensor and pytorch in view of memory?

I'm trying to train seq2seq model(transformer) with pytorch and tensor2tensor.我正在尝试使用 pytorch 和 tensor2tensor 训练 seq2seq 模型(变压器)。 When using tensor2tensor, the batch size can be like 1024, while pytorch model shows CUDA out of memory error with 8 batch size. When using tensor2tensor, the batch size can be like 1024, while pytorch model shows CUDA out of memory error with 8 batch size.

Is there any technique used in tensor2tensor to make best use of memory. tensor2tensor 中是否使用了任何技术来充分利用 memory。

If anyone know this, please tell me.如果有人知道这一点,请告诉我。

Thanks in advance.提前致谢。

In Tensor2Tensor by default, the batch size is specified in the number of tokens (subwords) per single GPU.默认情况下,在 Tensor2Tensor 中,批量大小以每个 GPU 的令牌(子字)数指定。 This allows to use a higher number of short sequences (sentences) in one batch or a smaller number of long sequences.这允许在一批中使用更多数量的短序列(句子)或更少数量的长序列。 Most other toolkits use a fixed batch size specified in the number of sequences.大多数其他工具包使用在序列数中指定的固定批量大小。 Either way, it may be a good idea to limit the maximum sentence length in training to a reasonable number to prevent Out-of-memory errors and excessive padding.无论哪种方式,最好将训练中的最大句子长度限制在一个 合理的数字,以防止内存不足错误和过度填充。 Some toolkits also prefer to specify the total batch size per all GPU cards.一些工具包还喜欢指定所有 GPU 卡的总批量大小。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM