无法使用通用语音数据训练 Wav2vec XLSR 模型

Question

I am trying to train a transformer ASR model with wav2vec XLSR in the danish language, but whenever I try to pull the danish dataset with datasets library it's giving me an error.. Notebook link我正在尝试使用丹麦语的 wav2vec XLSR 训练变压器 ASR 模型，但是每当我尝试使用数据集库提取丹麦数据集时，它都会给我一个错误。 笔记本链接

error log:错误日志：

ValueError: BuilderConfig da not found. ValueError：未找到 BuilderConfig。 Available: ['ab', 'ar', 'as', 'br', 'ca', 'cnh', 'cs', 'cv', 'cy', 'de', 'dv', 'el', 'en', 'eo', 'es', 'et', 'eu', 'fa', 'fi', 'fr', 'fy-NL', 'ga-IE', 'hi', 'hsb', 'hu', 'ia', 'id', 'it', 'ja', 'ka', 'kab', 'ky', 'lg', 'lt', 'lv', 'mn', 'mt', 'nl', 'or', 'pa-IN', 'pl', 'pt', 'rm-sursilv', 'rm-vallader', 'ro', 'ru', 'rw', 'sah', 'sl', 'sv-SE', 'ta', 'th', 'tr', 'tt', 'uk', 'vi', 'vot', 'zh-CN', 'zh-HK', 'zh-TW']可用：['ab'、'ar'、'as'、'br'、'ca'、'cnh'、'cs'、'cv'、'cy'、'de'、'dv'、'el' , 'en', 'eo', 'es', 'et', 'eu', 'fa', 'fi', 'fr', 'fy-NL', 'ga-IE', 'hi', ' hsb'，'hu'，'ia'，'id'，'it'，'ja'，'ka'，'kab'，'ky'，'lg'，'lt'，'lv'，'mn' , 'mt', 'nl', 'or', 'pa-IN', 'pl', 'pt', 'rm-sursilv', 'rm-vallader', 'ro', 'ru', 'rw' ，'sah'，'sl'，'sv-SE'，'ta'，'th'，'tr'，'tt'，'uk'，'vi'，'vot'，'zh-CN'，' zh-HK', 'zh-TW']

Answer 1

I checked it for you.我帮你查了。

The Danish language subset to the Corpus is supported in:语料库的丹麦语子集支持：

Common Voice Corpus 8.0通用语音语料库 8.0
Common Voice Corpus 9.0通用语音语料库 9.0

releases.发布。

However, Hugging Face's datasets library (version 2.2.1) uses the 6.1.0 version of the Corpus.但是，Hugging Face 的数据集库（2.2.1 版）使用的是 6.1.0 版的语料库。 You can check yourself this by loading any subset of corpus and printing dataset info as follows:您可以通过加载语料库的任何子集并打印数据集信息来检查自己，如下所示：

Code代码

from datasets import load_dataset

dataset_de = load_dataset("common_voice", "de")
print(dataset_de.info)

Output输出

Downloading and preparing dataset common_voice/de (download: 21.68 GiB, 
generated: 137.78 MiB, post-processed: Unknown size, total: 21.82 GiB) to 
/root/.cache/huggingface/datasets/common_voice/de/6.1.0/

See the Corpus Details查看语料库详细信息

See the Librarry见图书馆

You should wait for a new release of the library or open a request to their repo .您应该等待库的新版本或向他们的repo提出请求。

无法使用通用语音数据训练 Wav2vec XLSR 模型

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-05-16 08:28:49

无法使用通用语音数据训练 Wav2vec XLSR 模型

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-05-16 08:28:49

解决方案1
0 已采纳 2022-05-16 08:28:49