[英]Transliterate sentence written in 2 different scripts to a single script
I am able to convert an Hindi script written in English back to Hindi我能够将用英语编写的印地语脚本转换回印地语
import codecs,string
from indic_transliteration import sanscript
from indic_transliteration.sanscript import SchemeMap, SCHEMES, transliterate
def is_hindi(character):
maxchar = max(character)
if u'\u0900' <= maxchar <= u'\u097f':
return character
else:
print(transliterate(character, sanscript.ITRANS, sanscript.DEVANAGARI)
character = 'bakrya'
is_hindi(character)
Output:
बक्र्य
But If I try to do something like this, I don't get any conversions但是如果我尝试做这样的事情,我不会得到任何转换
character = 'Bakrya विकणे आहे'
is_hindi(character)
Output:
Bakrya विकणे आहे
Expected Output:
बक्र्य विकणे आहे
I also tried the library Polyglot but I am getting similar results with it.我也尝试过图书馆 Polyglot,但我得到了类似的结果。
Preface: I know nothing of devanagari, so you will have to bear with me.前言:我对梵文一无所知,所以你必须忍受我。
First, consider your function.首先,考虑你的功能。 It can return two things, character
or None
(print just outputs something, it doesn't actually return a value).它可以返回两个东西, character
或None
(打印只是输出一些东西,它实际上并不返回值)。 That makes your first output example originate from the print function, not Python evaluating your last statement.这使得您的第一个输出示例源自 print 函数,而不是 Python 评估您的最后一条语句。
Then, when you consider your second test string, it will see that there's some Devanagari text and just return the string back.然后,当您考虑第二个测试字符串时,它会看到有一些梵文文本,然后将字符串返回。 What you have to do, if this transliteration works as I think it does, is to apply this function to every word in your text.如果这种音译像我认为的那样有效,您必须做的是将这个功能应用到文本中的每个单词。
I modified your function to:我将您的功能修改为:
def is_hindi(character):
maxchar = max(character)
if u'\u0900' <= maxchar <= u'\u097f':
return character
else:
return transliterate(character, sanscript.ITRANS, sanscript.DEVANAGARI)
and modified your call to并将您的电话修改为
' '.join(map(is_hindi, character.split()))
Let me explain, from right to left.我来解释一下,从右到左。 First, I split your test string into the separate words with .split()
.首先,我使用.split()
将您的测试字符串拆分为单独的单词。 Then, I map (ie, apply the function to every element) the new is_hindi
function to this new list.然后,我将新的is_hindi
函数映射(即,将该函数应用于每个元素)到这个新列表。 Last, I join the separate words with a space to return your converted string.最后,我用空格连接单独的单词以返回转换后的字符串。
Output:输出:
'बक्र्य विकणे आहे'
If I may suggest, I would place this splitting/mapping functionality into another function, to make things easier to apply.如果我可以建议,我会将这个拆分/映射功能放到另一个函数中,以使事情更容易应用。
Edit: I had to modify your test string from 'Bakrya विकणे आहे'
to 'bakrya विकणे आहे'
because B
wasn't being converted.编辑:我不得不将您的测试字符串从'Bakrya विकणे आहे'
为'bakrya विकणे आहे'
因为B
没有被转换。 This can be fixed in a generic text with character.lower()
.这可以通过character.lower()
在通用文本中修复。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.