如何检索包含印地文文本的字符串中char的unicode十进制表示形式？

Question

I am using visual studio 2010 in c# for converting text into unicodes. 我正在c＃中使用Visual Studio 2010将文本转换为unicode。 Like i have a string abc= "मेरा" . 就像我有一个字符串abc =“मेरा”。 there are 4 characters in this string. 该字符串中有4个字符。 i need all the four unicode characters. 我需要所有四个unicode字符。 Please help me. 请帮我。

Answer 1

Since a .Net char is a Unicode character (at least, for the BMP code point), you can simply enumerate all characters in a string: 由于.Net char 是 Unicode字符（至少对于BMP代码点而言），因此您可以简单地枚举字符串中的所有字符：

var abc = "मेरा";

foreach (var c in abc)
{
    Console.WriteLine((int)c);
}

resulting in 导致

Answer 2

When you write a code like string abc= "मेरा"; 当您编写类似string abc= "मेरा";的代码时string abc= "मेरा"; , you already have it as Unicode (specifically, UTF-16), so you don't have to convert anything. ，您已经将它作为Unicode（特别是UTF-16）使用，因此您无需进行任何转换。 If you want to access the singular characters, you can do that using normal index: eg abc[1] is े (DEVANAGARI VOWEL SIGN E). 如果要访问单数字符，则可以使用常规索引进行操作：例如abc[1]为े （DEVANAGARI VOWEL SIGN E）。

If you want to see the numeric representations of those characters, just cast them to integers. 如果要查看这些字符的数字表示形式，只需将其转换为整数即可。 For example 例如

abc.Select(c => (int)c)

gives the sequence of numbers 2350, 2375, 2352, 2366. If you want to see the hexadecimal representation of those numbers, use ToString() : 给出数字2350、2375、2352、2366的序列。如果要查看这些数字的十六进制表示，请使用ToString() ：

abc.Select(c => ((int)c).ToString("x4"))

returns the sequence of strings "092e", "0947", "0930", "093e". 返回字符串“ 092e”，“ 0947”，“ 0930”，“ 093e”的序列。

Note that when I said numeric representations, I actually meant their encoding using UTF-16. 请注意，当我说数字表示形式时，实际上是指使用UTF-16进行编码。 For characters in the Basic Multilingual Plane , this is the same as their Unicode code point. 对于基本多语言平面中的字符，这与它们的Unicode代码点相同。 The vast majority of used characters lie in BMP, including those 4 Hindi characters presented here. 绝大部分使用过的字符都位于BMP中，包括此处介绍的这4种印地语字符。

If you wanted to handle characters in other planes too, you could use code like the following. 如果您也想处理其他平面中的字符，则可以使用以下代码。

byte[] bytes = Encoding.UTF32.GetBytes(abc);

int codePointCount = bytes.Length / 4;

int[] codePoints = new int[codePointCount];

for (int i = 0; i < codePointCount; i++)
    codePoints[i] = BitConverter.ToInt32(bytes, i * 4);

Since UTF-32 encodes all (21-bit) code points directly, this will give you them. 由于UTF-32直接对所有（21位）代码点进行编码，因此可以为您提供这些信息。 (Maybe there is a more straightforward solution, but I haven't found one.) （也许有一个更直接的解决方案，但我还没有找到。）

Answer 3

use 采用

System.Text.Encoding.UTF8.GetBytes(abc)

that will return your unicode values. 这将返回您的unicode值。

Answer 4

If you are trying to convert files from a legacy encoding into Unicode: 如果您尝试将文件从传统编码转换为Unicode：

Read the file, supplying the correct encoding of the source files, then write the file using the desired Unicode encoding scheme. 读取文件，提供源文件的正确编码，然后使用所需的Unicode编码方案写入文件。

    using (StreamReader reader = new StreamReader(@"C:\MyFile.txt", Encoding.GetEncoding("ISCII")))
    using (StreamWriter writer = new StreamWriter(@"C:\MyConvertedFile.txt", false, Encoding.UTF8))
    {
        writer.Write(reader.ReadToEnd());
    }

If you are looking for a mapping of Devanagari characters to the Unicode code points: 如果要查找梵文字符到Unicode代码点的映射：

You can find the chart at the Unicode Consortium website here . 您可以在图表Unicode协会的网站在这里。

Note that Unicode code points are traditionally written in hexidecimal. 请注意，Unicode代码点传统上以十六进制编写。 So rather than the decimal number 2350, the code point would be written as U+092E, and it appears as 092E on the code chart. 因此，代码点将代替十进制数字2350，而是写为U + 092E，并且在代码表上显示为092E。

Answer 5

If you have the string s = मेरा then you already have the answer. 如果您有字符串s = मेरा那么您已经有了答案。

This string contains four code points in the BMP which in UTF-16 are represented by 8 bytes. 该字符串在BMP中包含四个代码点，在UTF-16中由8个字节表示。 You can access them by index with s[i] , with a foreach loop etc. 您可以使用s[i]进行索引，并使用foreach循环等访问它们。

If you want the underlying 8 bytes you can access them as so: 如果需要底层的8个字节，则可以这样访问它们：

string str = @"मेरा";
byte[] arr = System.Text.UnicodeEncoding.GetBytes(str);

如何检索包含印地文文本的字符串中char的unicode十进制表示形式？

问题描述

5 个解决方案

解决方案1
3 2011-05-05 19:57:21

解决方案2
2 已采纳 2011-05-05 19:56:50

解决方案3
1 2011-05-05 19:34:39

解决方案4
1 2011-05-05 19:46:24

解决方案5
1 2011-05-05 19:57:22

如何检索包含印地文文本的字符串中char的unicode十进制表示形式？

问题描述

5 个解决方案

解决方案1 3 2011-05-05 19:57:21

解决方案2 2 已采纳 2011-05-05 19:56:50

解决方案3 1 2011-05-05 19:34:39

解决方案4 1 2011-05-05 19:46:24

解决方案5 1 2011-05-05 19:57:22

解决方案1
3 2011-05-05 19:57:21

解决方案2
2 已采纳 2011-05-05 19:56:50

解决方案3
1 2011-05-05 19:34:39

解决方案4
1 2011-05-05 19:46:24

解决方案5
1 2011-05-05 19:57:22