简体   繁体   English

如何使用本地自定义数据集训练 Wav2vec2 XLSR

[英]How to Train Wav2vec2 XLSR With local Custom Dataset

I want to train a speech to text model with wav2vec2 xlsr (transformer-based model) in danish language, as a recommendation, many people train their model using common voice with the help of datasets library, but in common voice, there is very less amount of data for danish, now I want to train the model with my own custom data, but I am failed to find any clear documentation for this, can anybody please help me with this, that how can I do it step by step?我想用丹麦语用wav2vec2 xlsr (基于变压器的模型)训练语音到文本 model,作为推荐,许多人在数据集库的帮助下使用普通语音训练他们的 model,但在普通语音中,很少丹麦语的数据量,现在我想用我自己的自定义数据训练 model,但我找不到任何明确的文档,有人可以帮我解决这个问题,我该如何一步一步来?

I suggest you to extend Common Voice (CV) Danish subset with your own dataset.我建议您使用自己的数据集扩展通用语音 (CV) 丹麦语子集。 Analyse dataset first and make your data like CV corpus.首先分析数据集,使您的数据像 CV 语料库一样。 At this point: data extension (.wav, .mp3 ...), type (float32, int ...), audio lengths and of course transcription formats are important.此时:数据扩展名(.wav、.mp3 ...)、类型(float32、int ...)、音频长度,当然还有转录格式很重要。 Don not make your corpus sparse.不要让你的语料库稀疏。

Place you data into CV corpus folder and load dataset.将数据放入 CV 语料库文件夹并加载数据集。 Then you should be able to fine-tune model with extended data using existing code.然后,您应该能够使用现有代码使用扩展数据微调模型。

Do not create completely new corpus If you are not an expert of wav2vec.如果您不是 wav2vec 专家,请不要创建全新的语料库。

A Note: You should get reasonable result using less data.注意:您应该使用较少的数据获得合理的结果。 What WER did you achieve and what is your target.您实现了什么 WER,您的目标是什么。 Hyper-parameter tuning may be the first thing you look for instead of data.超参数调整可能是您寻找的第一件事而不是数据。

Try https://github.com/jonatasgrosman/huggingsound .试试https://github.com/jonatasgrosman/huggingsound This tool makes it easy to fine-tune wav2vec2 models using local custom data.这个工具可以很容易地使用本地自定义数据微调 wav2vec2 模型。

I've built a tool to help me to fine-tune wav2vec2 models using custom data.我已经构建了一个工具来帮助我使用自定义数据微调 wav2vec2 模型。 Maybe this can help you too: https://github.com/jonatasgrosman/huggingsound .也许这也可以帮助你: https://github.com/jonatasgrosman/huggingsound

You can install it using: pip install huggingsound您可以使用以下方式安装它: pip install huggingsound

To fine-tune the XLSR model using a custom dataset, you'll need to do something like this:要使用自定义数据集微调 XLSR model,您需要执行以下操作:

from huggingsound import TrainingArguments, ModelArguments, SpeechRecognitionModel, TokenSet

model = SpeechRecognitionModel("facebook/wav2vec2-large-xlsr-53")
output_dir = "my/finetuned/model/output/dir"

# first of all, you need to define your model's token set
# however, the token set is only needed for non-finetuned models
# if you pass a new token set for an already finetuned model, it'll be ignored during training
tokens = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z", "'"]
token_set = TokenSet(tokens)

# define your custom train data
train_data = [
    {"path": "/path/to/sagan.mp3", "transcription": "extraordinary claims require extraordinary evidence"},
    {"path": "/path/to/asimov.wav", "transcription": "violence is the last refuge of the incompetent"},
]

# and finally, fine-tune your model
model.finetune(
    output_dir, 
    train_data=train_data,
    token_set=token_set,
)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM