简体   繁体   English

如何在python中制作语言翻译器?

[英]How to make a language translator in python?

I'm making a small dictionary in python, and am now stuck on part where I want to translate a whole sentence.我正在用 python 制作一个小字典,现在我想翻译整个句子的部分被卡住了。 I have a dictionary set up with few words我有一本字典,里面的单词很少

english_italian = {"hey": "ciao", "my": "mio", "name": "nome"}

How do i translate a sentence like "hello my name is Mario" so the output would be like我如何翻译像“你好,我的名字是马里奥”这样的句子,所以输出就像

ciao mio nome is Mario

So the words that are not in the dictionary would print out like they were written and words that can be found in the dictionary will be translated.因此,字典中没有的单词会像写出来的那样打印出来,并且可以在字典中找到的单词将被翻译。

english_italian = {"hey": "ciao", "my": "mio", "name": "nome"}

s = "hey my name is Mario"

# split s into a list of words, and search the dict for each. 
# if the word is not found, keep the original word
translated = " ".join(english_italian.get(word, word) for word in s.split(" "))

print(translated)

Output:输出:

'ciao mio nome is Mario'

If you want to leave words not in the dictionnary untranslated:如果您想不翻译字典中没有的单词:

english_italian = {"hey": "ciao", "my": "mio", "name": "nome"}
s = "hello my name is Mario"

translated = ' '.join([
    english_italian[english_word] 
    if english_word in english_italian else english_word
    for english_word in s.split(' ')])

print(translated)

Output:输出:

hello mio nome is Mario

A problem you'll have to tackle first is translating a sentence as a string, to an array of words (which could be a question by itself).您必须首先解决的一个问题是将一个句子作为一个字符串翻译成一个单词数组(这本身可能是一个问题)。 Once you have this array of words (let's call it words ), loop through the words and translate each word to its Italian counterpart.一旦你有了这个单词数组(我们称之为words ),循环遍历这些单词并将每个单词翻译成意大利语对应的单词。 Ofcourse, just pasting the translated words together will form a sentence different than the original one (as you'll ignore punctuation, spaces, ...).当然,只是将翻译后的单词粘贴在一起会形成一个与原始句子不同的句子(因为您将忽略标点符号、空格等)。 eg "hey, my name is mario" will translate to "ciao mio nome is Mario" (mention the missing comma).例如,“嘿,我的名字是马里奥”将翻译为“ciao mio nome is Mario”(提及缺少的逗号)。

To solve this, you could replace each translated word in the original sentence, to its Italian counterpart, which also retains the original words not in the translation.为了解决这个问题,您可以原始句子中的每个翻译词替换为其意大利语对应词,这也保留了翻译中没有的原始词。 This produces the following code:这会产生以下代码:

english_italian = {"hey": "ciao", "my": "mio", "name": "nome"}
sentence = "ciao mio nome is Mario"
words = sentence_to_words(sentence) # ["ciao", "mio", "nome", "is", "Mario"]

for word in words:
    if word in english_italian:
        sentence = sentence.replace(word, english_italian[word])

An optimalization could be to first remove duplicates from the words array, so you don't translate the same word more than once.优化可能是首先从 words 数组中删除重复项,因此您不会多次翻译同一个单词。

All and all, translating like this will not work that well (think about verb conjugation, different order of parts in the sentence such as the subject and verbs, grammatical differences, etc.).总而言之,像这样翻译不会那么好用(想想动词变位、句子中不同部分的顺序,例如主语和动词、语法差异等)。

You could try the following:您可以尝试以下操作:

english_italian = {"hey": "ciao", "my": "mio", "name": "nome"}

sentence = "hey my name is Mario"

sentence_list = sentence.split(" ")

# We get this: ['hey', 'my', 'name', 'is', 'Mario']

translation_list = [english_italian[word] if word in english_italian.keys() else word for word in sentence_list]

trans_sentence = " ".join(translation_list)

print(trans_sentence)

Output输出

ciao mio nome is Mario

We take the sentence, split it up into words, translate each word if it is in the dictionary, else we keep the word.我们取出句子,将其拆分为单词,如果每个单词出现在字典中,则对其进行翻译,否则我们保留该单词。 Then we join the list of translated words and print it.然后我们加入翻译的单词列表并打印它。

Hope it helps.希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM