将torchaudio加载的16位音频从`float32`截断到`float16`是否安全？

Question

I have multiple WAV files with 16 bits of depth/precision.我有多个深度/精度为 16 位的 WAV 文件。 torchaudio.info(...) recognizes this, giving me: torchaudio.info(...)认识到这一点，给我：

precision = {int} 16

Yet when I use torchaudio.load(...) , I get a float32 dtype for the resulting tensor.然而，当我使用torchaudio.load(...)时，我得到了结果张量的float32 dtype。 With a tensor called audio , I know that I can do audio.half() to truncate it to 16 bits, reducing memory usage of my dataset.使用名为audio的张量，我知道我可以执行audio.half()将其截断为 16 位，从而减少 memory 对我的数据集的使用。 But is this an operation that will preserve precision of all possible original values?但这是一个可以保持所有可能原始值精度的操作吗？ I'm not lowering the dtype's precision below the original audio's precision, but there may be a good reason I'm unaware of as to why torchaudio still returns float32 .我没有将 dtype 的精度降低到低于原始音频的精度，但我可能不知道为什么torchaudio仍然返回float32是有充分理由的。

Answer 1

I would say it's returned as float32 because this is pytorch's default datatype.我会说它作为 float32 返回，因为这是 pytorch 的默认数据类型。 So if you create any models with weights, they'll be float32 as well.因此，如果您创建任何具有权重的模型，它们也将是 float32。 Therefore, the inputs will be incompatible with the model if you make the conversion on the input data.因此，如果对输入数据进行转换，则输入将与 model 不兼容。 (E: or it will silently convert your data to 32 bit anyway, to make it compatible with your model. Not sure which pytorch opt for, but tensorflow definitely throws the error). （E：否则它会默默地将您的数据转换为 32 位，以使其与您的 model 兼容。不确定选择哪个 pytorch，但 tensorflow 肯定会抛出错误）。

Look at setting the default datatype to float16, before creating any models, if you're looking to make small models: https://pytorch.org/docs/stable/generated/torch.set_default_dtype.html如果您想制作小型模型，请在创建任何模型之前查看将默认数据类型设置为 float16： https://pytorch.org/docs/stable/generated/torch.set_default_dtype.html

HOWEVER note that you will lose 5 bits of precision if you convert a 16-bit int, as you have diagnosed that the number actually is (but is represented as a 32-bit float), to a 16-bit float.但是请注意，如果将 16 位 int 转换为 16 位浮点数，您将失去 5 位精度，因为您已经诊断出该数字实际上是（但表示为 32 位浮点数）。 This is because 5 bits of precision are used in the exponent, leaving just 10 bits to represent the decimal part representing the number.这是因为在指数中使用了 5 位精度，只剩下 10 位来表示代表数字的小数部分。

I would just keep it at float32, if you're not particularly memory constrained.如果您不是特别受 memory 限制，我会将其保持在 float32。

将torchaudio加载的16位音频从`float32`截断到`float16`是否安全？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-12-10 17:55:58

将torchaudio加载的16位音频从`float32`截断到`float16`是否安全？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-12-10 17:55:58

解决方案1
2 已采纳 2020-12-10 17:55:58