簡體 English 中英

我們如何在機器翻譯中生成第一個目標詞？

[英]How do we generate the first target words in machine translation?

原文 2022-07-25 03:57:11 1 1 tokenize/ machine-translation

我正在學習使用變壓器的機器翻譯任務。 據我所知，變形金剛 model 根據源句的前一個詞預測目標句的下一個詞。 但是，在 MarianMT model（或 T5）中，我發現它的標記器沒有句子開頭標記（<cls> 或 <s>）。 我認為需要一個標記來開始預測目標句子中的第一個單詞。

誰能向我解釋 MarianMT model 將如何預測目標句子中的第一個單詞？

謝謝你。

1 個解決方案

從文檔中：

model 開始生成 pad_token_id （其中0作為 token_embedding）作為前綴（Bart 使用<s/> ）

因此它不需要 SOS 令牌，因為它在訓練期間使用填充令牌作為第一個令牌。

如何實現字典中沒有的單詞必須顯示錯誤？

[英]How do I implement words that are not in dictionary must shown with error?

如何解決 nltk.corpus.words.words() 中的缺失詞？

[英]How to solve missing words in nltk.corpus.words.words()?

如何使用CountVectorizer獲取短語的計數而不計算短語中的單詞？

[英]How do I use CountVectorizer to get the count of a phrase without counting words in the phrase?

為什么在訓練翻譯機之前經常從數據中刪除像 () "" : [] 這樣的特殊字符？

[英]Why special characters like () "" : [] are often removed from data before training translation machine?

你如何檢索文件的前n行？（PHP）

[英]How do you retrieve the first n lines of a file? (php)

JLanguageTool 不會忽略單詞中的數字

[英]JLanguageTool do not ignore digits in words

如何標記復合詞？

[英]How to tokenize compound words?

如何在不出現類型錯誤的情況下將文本數據標記為單詞和句子

[英]How do I tokenize a text data into words and sentences without getting a type error

如何將某些單詞視為 nltk Python 中的分隔符？

[英]How to treat certain words as delimiters in nltk Python?

在PHP中，如何保存用空格和線條分隔的單詞並將單詞放入數組

[英]In PHP, how to save words separated by space and lines and put words in array

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 如何實現字典中沒有的單詞必須顯示錯誤？如何解決 nltk.corpus.words.words() 中的缺失詞？如何使用CountVectorizer獲取短語的計數而不計算短語中的單詞？為什么在訓練翻譯機之前經常從數據中刪除像 () "" : [] 這樣的特殊字符？你如何檢索文件的前n行？（PHP） JLanguageTool 不會忽略單詞中的數字如何標記復合詞？如何在不出現類型錯誤的情況下將文本數據標記為單詞和句子如何將某些單詞視為 nltk Python 中的分隔符？在PHP中，如何保存用空格和線條分隔的單詞並將單詞放入數組

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM