简体   繁体   English

如何限制 Wav2Vec 中特征向量的大小?

[英]How to limit the size of the features vector in Wav2Vec?

I'm attempting to receive a features vector of short wav (audio) files using wav2vec by using Hugging Face Transformers .我正在尝试通过使用Hugging Face Transformers使用 wav2vec 接收短 wav(音频)文件的特征向量。

However, for unknown reasons, no matter which approach I use to control the output size, the results do not meet my requirements.但是不知什么原因,无论我用哪种方式来控制输出大小,结果都达不到我的要求。

Ideally, I'd like to get all of the vectors to be the same length (eg 60K).理想情况下,我希望所有向量的长度都相同(例如 60K)。 I try to get it with the following command:我尝试使用以下命令获取它:

feature_extractor(input_audio, sampling_rate=16000, return_tensors="np", padding="max_length",
                                    max_length=60000).input_values

That command helped me create a minimal boundary of the data size by padding all the vectors into a minimum of 60K length, but I was surprised to see vectors with 120K values created as well.该命令通过将所有向量填充到最小 60K 长度帮助我创建了数据大小的最小边界,但我惊讶地看到创建了 120K 值的向量。

Then I remove the padding parameter in the hope of obtaining vectors with no padding but an upper boundary of 60K.然后我删除了填充参数,希望获得没有填充但上边界为 60K 的向量。 Based on the max_length documentation:基于max_length文档:

Maximum length of the returned list and optionally padding length返回列表的最大长度和可选的填充长度

So I executed this line:所以我执行了这一行:

feature_extractor(input_audio, sampling_rate=16000, return_tensors="np",
                                    max_length=60000).input_values

Unexpectedly, I receive vectors ranging in length from 20K to 120K.出乎意料的是,我收到了长度从 20K 到 120K 的向量。 Not limited at all.完全没有限制。


To reproduce my bug and results, I've included a snippet of code and a link to relevant audio data.为了重现我的错误和结果,我包含了一段代码和一个指向相关音频数据的链接。

import librosa
import numpy as np
from transformers import Wav2Vec2FeatureExtractor
from pathlib import Path

    p = Path(dataset_path)
    audio_files = [i.parents[0] / i.name for i in p.glob('**/*.wav')]
    feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained('facebook/wav2vec2-base-960h')
    for file in (audio_files):
        input_audio, _ = librosa.load(file,
                                      sr=16000)
        features_with_padding = feature_extractor(input_audio, sampling_rate=16000,
                                return_tensors="np", padding="max_length", max_length=60000).input_values                                
        features_without_padding = feature_extractor(input_audio, sampling_rate=16000,
                                  return_tensors="np", max_length=60000).input_values
        print(features_with_padding.shape, features_without_padding.shape)

In this drive folder, I attached 2 wav files that create about 80K length vector.这个驱动器文件夹中,我附加了 2 个 wav 文件,它们创建了大约 80K 长度的矢量。

How could I create a one-size feature vector with a wav2vec transformer?如何使用 wav2vec 转换器创建一个大小的特征向量?

At the moment truncation is not supported by the feature extractor in Hugging Face, so if you want to "pad" to a "max_length" that is shorter than the sample length, it simply won't change anything since no padding is needed.目前 Hugging Face 中的特征提取器不支持truncation ,所以如果你想“填充”到比样本长度短的“max_length”,它根本不会改变任何东西,因为不需要填充。

However, we should definitely add a truncation functionality to Transformers as it is very important.但是,我们绝对应该为 Transformers 添加truncation功能,因为它非常重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM