简体   繁体   English

Apertium翻译。 有没有办法获得原始短语

[英]Apertium translator. Is there a way to get the original phrase

Is there a way in apertium translator to get the original phrase for a translation? 有没有一种方法在apertium翻译器中获得翻译的原始短语?

IE get something like: IE得到类似的东西:

phrase: {
  original: { Hola, buenos días},
  translated: {Hello, good morning}
}

I need that in order to make a mechanism to improve the translations. 我需要这样做才能建立一种机制来改进翻译。

If you're sending a corpus through the command-line interface, eg 如果您通过命令行界面发送语料库,例如

xzcat corpus.sme.xz | sed 's/$/ ./' | apertium -f html-noent sme-nob > translated.nob.mt

then you can try simply 那你就可以试试

xzcat corpus.sme.xz | paste - translated.nob.mt

afterwards to get the input next to the output. 然后获取输出旁边的输入。 That's assuming you want to split things on newlines. 这假设您想要在换行符上拆分。 The sed is there to ensure words aren't moved across newlines (rules tend not to move across sentence boundaries). sed用于确保单词不会在换行符之间移动(规则往往不会跨越句子边界)。

This will be fast, but it's a bit hacky and there are many edge cases. 这将是快速的,但它有点hacky并且有许多边缘情况。


If you want more control, one way would be to install the JSON API locally and send one request at a time. 如果您想要更多控制,一种方法是在本地安装JSON API并一次发送一个请求。

If you've got a recent Debian/Ubuntu (or are using one of the apertium repos ), you can get the API with 如果你有一个最近的Debian / Ubuntu(或正在使用其中一个aperts repos ),你可以获得API

sudo apt install apertium-apy
sudo systemctl start apertium-apy   # start it right now
sudo systemctl enable apertium-apy  # let it start on next boot

And then you can translate like this: 然后你可以像这样翻译:

$ echo 'Jeg liker ikke ansjos' | curl --data-urlencode 'q@-' 'localhost:2737/translate?langpair=nob|nno'
{"responseDetails": null, "responseData": {"translatedText": "Eg likar ikkje ansjos"}, "responseStatus": 200}

(or from Javascript with standard ajax requests, some docs at http://wiki.apertium.org/wiki/Apertium-apy/Debian and http://wiki.apertium.org/wiki/Apertium-apy#Usage ) (或来自Javascript标准的ajax请求,一些文档在http://wiki.apertium.org/wiki/Apertium-apy/Debianhttp://wiki.apertium.org/wiki/Apertium-apy#Usage

Note that apertium-apy by default serves the pairs that are in /usr/share/apertium/modes; 请注意,apertium-apy默认为/ usr / share / apertium / modes中的对提供服务; if you start it manually (instead of through systemctl) you can point it at a different path. 如果您手动启动它(而不是通过systemctl),您可以将其指向不同的路径。


If you want to produce the JSON format you had in your example, the easiest way would be to use jq ( sudo apt install jq ), eg 如果你想生成你的例子中的JSON格式,最简单的方法是使用jqsudo apt install jq ),例如

$ orig="Jeg liker ikke ansjos"
$ echo "$orig" \
  | curl -Ss --data-urlencode 'q@-' 'localhost:2737/translate?langpair=nob|nno' \
  | jq "{phrase: {original:\"$orig\", translated:.responseData.translatedText }}"
{
  "phrase": {
    "original": "Jeg liker ikke ansjos",
    "translated": "Eg likar ikkje ansjos"
  }
}

or on a corpus: 或在语料库上:

xzcat corpus.nob.xz | while read -r orig; do 
  echo "$orig" \
    | curl -Ss --data-urlencode 'q@-' 'localhost:2737/translate?langpair=nob|nno' \
    | jq "{phrase: {original:\"$orig\", translated:.responseData.translatedText}}";
done

(A simple test of 500 lines showed this taking 23.7s wall clock time while the paste version took 5.5s.) (对500行的简单测试表明,这需要23.7秒的挂钟时间,而paste版本需要5.5秒。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM