如何使用本地自定义数据集训练 Wav2vec2 XLSR

Question

I want to train a speech to text model with wav2vec2 xlsr (transformer-based model) in danish language, as a recommendation, many people train their model using common voice with the help of datasets library, but in common voice, there is very less amount of data for danish, now I want to train the model with my own custom data, but I am failed to find any clear documentation for this, can anybody please help me with this, that how can I do it step by step?我想用丹麦语用wav2vec2 xlsr （基于变压器的模型）训练语音到文本 model，作为推荐，许多人在数据集库的帮助下使用普通语音训练他们的 model，但在普通语音中，很少丹麦语的数据量，现在我想用我自己的自定义数据训练 model，但我找不到任何明确的文档，有人可以帮我解决这个问题，我该如何一步一步来？

Answer 1

I suggest you to extend Common Voice (CV) Danish subset with your own dataset.我建议您使用自己的数据集扩展通用语音 (CV) 丹麦语子集。 Analyse dataset first and make your data like CV corpus.首先分析数据集，使您的数据像 CV 语料库一样。 At this point: data extension (.wav, .mp3 ...), type (float32, int ...), audio lengths and of course transcription formats are important.此时：数据扩展名（.wav、.mp3 ...）、类型（float32、int ...）、音频长度，当然还有转录格式很重要。 Don not make your corpus sparse.不要让你的语料库稀疏。

Place you data into CV corpus folder and load dataset.将数据放入 CV 语料库文件夹并加载数据集。 Then you should be able to fine-tune model with extended data using existing code.然后，您应该能够使用现有代码使用扩展数据微调模型。

Do not create completely new corpus If you are not an expert of wav2vec.如果您不是 wav2vec 专家，请不要创建全新的语料库。

A Note: You should get reasonable result using less data.注意：您应该使用较少的数据获得合理的结果。 What WER did you achieve and what is your target.您实现了什么 WER，您的目标是什么。 Hyper-parameter tuning may be the first thing you look for instead of data.超参数调整可能是您寻找的第一件事而不是数据。

Answer 2

Try https://github.com/jonatasgrosman/huggingsound .试试https://github.com/jonatasgrosman/huggingsound 。 This tool makes it easy to fine-tune wav2vec2 models using local custom data.这个工具可以很容易地使用本地自定义数据微调 wav2vec2 模型。

Answer 3

I've built a tool to help me to fine-tune wav2vec2 models using custom data.我已经构建了一个工具来帮助我使用自定义数据微调 wav2vec2 模型。 Maybe this can help you too: https://github.com/jonatasgrosman/huggingsound .也许这也可以帮助你： https://github.com/jonatasgrosman/huggingsound 。

You can install it using: pip install huggingsound您可以使用以下方式安装它： pip install huggingsound

To fine-tune the XLSR model using a custom dataset, you'll need to do something like this:要使用自定义数据集微调 XLSR model，您需要执行以下操作：

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53")
output_dir = "my/finetuned/model/output/dir"

# first of all, you need to define your model's token set
# however, the token set is only needed for non-finetuned models
# if you pass a new token set for an already finetuned model, it'll be ignored during training
tokens = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
token_set = TokenSet(tokens)

# define your custom train data
train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

# and finally, fine-tune your model
model.finetune(
    output_dir, 
    train_data=train_data,
    token_set=token_set,
)

如何使用本地自定义数据集训练 Wav2vec2 XLSR

问题描述

2 个解决方案

解决方案1
0 2022-06-29 08:22:01

解决方案2
0 2022-07-17 14:29:41

解决方案3
-1 2022-07-25 17:44:25

如何使用本地自定义数据集训练 Wav2vec2 XLSR

问题描述

2 个解决方案

解决方案1 0 2022-06-29 08:22:01

解决方案2 0 2022-07-17 14:29:41

解决方案3 -1 2022-07-25 17:44:25

解决方案1
0 2022-06-29 08:22:01

解决方案2
0 2022-07-17 14:29:41

解决方案3
-1 2022-07-25 17:44:25