繁体 English 中英

我们如何在机器翻译中生成第一个目标词？

[英]How do we generate the first target words in machine translation?

原文 2022-07-25 03:57:11 9 1 tokenize/ machine-translation

我正在学习使用变压器的机器翻译任务。 据我所知，变形金刚 model 根据源句的前一个词预测目标句的下一个词。 但是，在 MarianMT model（或 T5）中，我发现它的标记器没有句子开头标记（<cls> 或 <s>）。 我认为需要一个标记来开始预测目标句子中的第一个单词。

谁能向我解释 MarianMT model 将如何预测目标句子中的第一个单词？

谢谢你。

1 个解决方案

从文档中：

model 开始生成 pad_token_id （其中0作为 token_embedding）作为前缀（Bart 使用<s/> ）

因此它不需要 SOS 令牌，因为它在训练期间使用填充令牌作为第一个令牌。

如何实现字典中没有的单词必须显示错误？

[英]How do I implement words that are not in dictionary must shown with error?

如何解决 nltk.corpus.words.words() 中的缺失词？

[英]How to solve missing words in nltk.corpus.words.words()?

如何使用CountVectorizer获取短语的计数而不计算短语中的单词？

[英]How do I use CountVectorizer to get the count of a phrase without counting words in the phrase?

为什么在训练翻译机之前经常从数据中删除像 () "" : [] 这样的特殊字符？

[英]Why special characters like () "" : [] are often removed from data before training translation machine?

你如何检索文件的前n行？（PHP）

[英]How do you retrieve the first n lines of a file? (php)

JLanguageTool 不会忽略单词中的数字

[英]JLanguageTool do not ignore digits in words

如何标记复合词？

[英]How to tokenize compound words?

如何在不出现类型错误的情况下将文本数据标记为单词和句子

[英]How do I tokenize a text data into words and sentences without getting a type error

如何将某些单词视为 nltk Python 中的分隔符？

[英]How to treat certain words as delimiters in nltk Python?

在PHP中，如何保存用空格和线条分隔的单词并将单词放入数组

[英]In PHP, how to save words separated by space and lines and put words in array

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何实现字典中没有的单词必须显示错误？如何解决 nltk.corpus.words.words() 中的缺失词？如何使用CountVectorizer获取短语的计数而不计算短语中的单词？为什么在训练翻译机之前经常从数据中删除像 () "" : [] 这样的特殊字符？你如何检索文件的前n行？（PHP） JLanguageTool 不会忽略单词中的数字如何标记复合词？如何在不出现类型错误的情况下将文本数据标记为单词和句子如何将某些单词视为 nltk Python 中的分隔符？在PHP中，如何保存用空格和线条分隔的单词并将单词放入数组

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM