简体   繁体   English

使用 Python 将双语文本分成两部分

[英]Splitting a bilingual text into two parts using Python

input text: "gut, wird gemacht right, will do (inf)" output text: gut, wird gemacht right, will do (inf) input text: gut, mache ich right, will do (inf) or I'll do that output text: gut, mache ich right, will will do (inf) or I'll do that input text: "wie mans macht, ists verkehrt whatever you do is wrong" output text: wie mans macht, ists verkehrt whatever you do is wrong输入文本:“gut, wird gemacht right, will do (inf)” output 文本:gut, wird gemacht right, will do (inf) input text: gut, mache ich right, will do (inf) or I'll do that output 文本:gut,mache ich right,will will do (inf) or I'll do that input text: "wie mans macht, ists verkehrt whatever you do is wrong" output 文本:wie mans macht, ists verkehrt 无论你做什么都是错的错误的

First off, please try to solve the problem yourself first.首先,请先尝试自己解决问题。 As @Julien points out, no one will write code for you.正如@Julien 指出的那样,没有人会为您编写代码。

To answer your question, you need to find an alghoritm that can detect which language a text is written in, and specify how certain it is (eg, counting letter frequencies has a surprisingly good hit rate, or you might want to use a database and compare words to that).要回答你的问题,你需要找到一个算法来检测文本是用哪种语言编写的,并指定它的确定性(例如,计算字母频率的命中率非常好,或者你可能想使用数据库和将单词与之进行比较)。

The next step is to choose an algorithm to find the most likely split.下一步是选择一种算法来找到最有可能的拆分。 You could for instance evaluate each word individually, or try splitting the text in a couple locations to find what position is best.例如,您可以单独评估每个单词,或者尝试将文本拆分成几个位置,以找到最好的 position。

Once you have that set up it's just a matter of trying different things until you get the accuracy you need.设置完成后,只需尝试不同的事情,直到获得所需的准确性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM