简体   繁体   English

如何将 unicode 编码数据转换为梵文(印地语)文本

[英]How to convert unicode encoded data into Devanagri(Hindi) text

I am receiving SMS messages in the Devanagri (Hindi) script from my mobile phone into my desktop program, but it is displaying the data in an encoding (Eg. - 091A09470924002009240924) which I found out is unicode.我正在从我的手机将 Devanagri(印地语)脚本中的 SMS 消息接收到我的桌面程序中,但它以编码(例如 - 091A09470924002009240924)显示数据,我发现它是 unicode。 Is there an existing library that will allow me to convert this to hindi text?是否有现有的库可以让我将其转换为印地语文本? If not, how do I go about writing a method for this?如果没有,我该如何为这个写一个方法? I'm using C#.我正在使用 C#。

Use System.Text.Encoding class.使用 System.Text.Encoding class。 It has method GetChars(byte[]).它有方法 GetChars(byte[])。 And probably you'll need an appropriate font since some Hindi symbols can be written in several ways.可能你需要一个合适的字体,因为一些印地语符号可以用多种方式书写。

Here's code snippet I used for converting Georgian unicode to its Latin equivalent text.这是我用于将格鲁吉亚语unicode 转换为其拉丁语等效文本的代码片段。

string[] charset = new string[33] { "a", "b", "g", "d", "e", "v", "z", "T", "i", "k", "l", "m", "n", "o", "p", "J", "r", "s","t", "u", "f", "q", "R", "y", "S", "C", "c", "Z", "w", "W", "x", "j", "h" };
string unicodeString = "აბ, - გდ";
string latin_string = "";
byte[] unicodeBytes = Encoding.Unicode.GetBytes(unicodeString);
for (int p = 0; p < unicodeBytes.Length / 2; p++)
{
if (unicodeBytes[p * 2] > 207 && unicodeBytes[p * 2] < 241)
latin_string += charset[unicodeBytes[p * 2] - 208];
else
latin_string += Convert.ToChar(unicodeBytes[p * 2]).ToString();
}

explaining only the necessary part:只解释必要的部分:

Encoding.Unicode.GetBytes(unicodeString); returns array of bytes, length of this array is 2 * unicodeString.Length .返回字节数组,该数组的长度为2 * unicodeString.Length so that every letter from unicodestring has a pair of bytes.这样来自 unicodestring 的每个字母都有一对字节。 for a better explanation heres image attached为了更好的解释,附上图片在此处输入图像描述

unicodeBytes even indexes have values representing the letter you want to decode. unicodeBytes甚至索引都有代表您要解码的字母的值。 first letter of the Georgian alphabet was starting at 208 ending at 240 (33 in total).格鲁吉亚字母的第一个字母从 208 开始,到 240 结束(总共 33 个)。 so if unicodeBytes value was in the range of [208;240] i had to use the charset string array to get the Latin equivalent, otherwise unicodeBytes value was just char code.所以如果unicodeBytes值在 [208;240] 的范围内,我必须使用charset字符串数组来获得拉丁等价物,否则unicodeBytes值只是字符代码。

I don't know if there is a library for it but this method will give you basic idea how to write your own convertor.我不知道是否有它的库,但这种方法会给你基本的想法如何编写你自己的转换器。

Thanks for the responses, they helped me find the exact solution - http://social.msdn.microsoft.com/Forums/en/netfxbcl/thread/12a3558d-fe48-44fd-840e-03facfd9c944感谢您的回复,他们帮助我找到了确切的解决方案 - http://social.msdn.microsoft.com/Forums/en/netfxbcl/thread/12a3558d-fe48-44fd-840e-03facfd9c944

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM