简体   繁体   English

在C#中将Unicode字符转换为单个十六进制值

[英]Converting unicode character to a single hexadecimal value in C#

I am getting a character from a emf record using Encoding.Unicode.GetString and the resulting string contains only one character but has two bytes. 我正在使用Encoding.Unicode.GetString从emf记录中获取一个字符,结果字符串仅包含一个字符,但具有两个字节。 I don't have any idea about the encoding scheme and the multi byte character set. 我对编码方案和多字节字符集一无所知。 I want to convert that character to its equivalent single hexadecimal value.Can you help me regarding this.. 我想将该字符转换为等效的单个十六进制值。对此您能帮我吗。

It's not clear what you mean. 您的意思不清楚。 A char in C# is a 16-bit unsigned value. C#中的char是16位无符号值。 If you've got a binary data source and you want to get Unicode characters, you should use an Encoding to decode the binary data into a string, that you can access as a sequence of char values. 如果您有二进制数据源,并且想要获取Unicode字符,则应该使用Encoding将二进制数据解码为字符串,您可以将其作为char值序列进行访问。

You can convert a char to a hex string by first converting it to an integer, and then using the X format specifier like this: 您可以将char转换为十六进制字符串,方法是先将其转换为整数,然后使用X格式说明符,如下所示:

char = '\u0123';
string hex = ((int)c).ToString("X4"); // Now hex = "0123"

Now, that leaves one more issue: surrogate pairs . 现在,还有一个问题: 代理对 Values which aren't in the Basic Multilingual Plane (U+0000 to U+FFFF) are represented by two UTF-16 code units - a high surrogate and a low surrogate. 基本多语言平面中不存在的值(U + 0000至U + FFFF)由两个UTF-16代码单元表示-高代理和低代理。 You can use the char.IsSurrogate* methods to check for surrogate pairs... although it's harder (as far as I can see) to then convert a surrogate pair into a UCS-4 value. 您可以使用char.IsSurrogate*方法检查代理对……尽管(据我所知)将代理对转换为UCS-4值比较困难(据我所知)。 If you're lucky, you won't need to deal with this... if you're happy converting your binary data into a sequence of UTF-16 code units instead of strict UCS-4 values, you don't need to worry. 如果幸运的话,您将不需要处理此问题...如果您乐意将二进制数据转换为一系列UTF-16代码单元而不是严格的UCS-4值,则无需担心。

EDIT: Given your comments, it's still not entirely clear what you've got to start with. 编辑:给出您的评论,仍然不完全清楚您要开始。 You say you've got two bytes... are they separate, or in a byte array? 您说您有两个字节...它们是分开的,还是在字节数组中? What do they represent? 它们代表什么? Text in a particular encoding, presumably... but which encoding? 可能是采用特定编码的文本,但是...哪种编码? Once you know the encoding, you can convert a byte array into a string easily: 一旦知道了编码,就可以轻松地将字节数组转换为字符串:

byte[] bytes = ...;
// For example, if your binary data is UTF-8
string text = Encoding.UTF8.GetString(bytes);
char firstChar = text[0];
string hex = ((int)firstChar).ToString("X4");

If you could edit your question to give more details about your actual situation, it would be a lot easier to help you get to a solution. 如果您可以编辑问题以提供有关实际情况的更多详细信息,那么帮助您找到解决方案将容易得多。 If you're generally confused about encodings and the difference between text and binary data, you might want to read my article about it . 如果您通常对编码以及文本和二进制数据之间的区别感到困惑,则可能需要阅读有关它的文章

Try this: 尝试这个:

System.Text.Encoding.Unicode.GetBytes(theChar.ToString())
     .Aggregate("", (agg, val) => agg + val.ToString("X2"));

However, since you don't specify exactly what encoding that the character is in, this could fail. 但是,由于您没有确切指定字符的编码方式,因此可能会失败。 Futher, you don't make it very clear if you want the output to be a string of hex chars or bytes. 此外,如果您希望输出为十六进制字符或字节的字符串,则不清楚。 I'm guessing the former, since I'd guess you want to generate HTML. 我猜是前者,因为我想您想生成HTML。 Let me know if any of this is wrong. 让我知道这是否有错。

I created an extension method to convert unicode or non-unicode string to hex string. 我创建了一个扩展方法,将Unicode或非Unicode字符串转换为十六进制字符串。

I shared for whom concern. 我分享了谁的关注。

public static class StringHelper
    {
        public static string ToHexString(this string str)
        {
            byte[] bytes = str.IsUnicode() ? Encoding.UTF8.GetBytes(str) : Encoding.Default.GetBytes(str);

            return BitConverter.ToString(bytes).Replace("-", string.Empty);
        }

        public static bool IsUnicode(this string input)
        {
            const int maxAnsiCode = 255;

            return input.Any(c => c > maxAnsiCode);
        }
}

Get thee to StringInfo: 将您获取到StringInfo:

http://msdn.microsoft.com/en-us/library/system.globalization.stringinfo.aspx http://msdn.microsoft.com/zh-CN/library/system.globalization.stringinfo.aspx

http://msdn.microsoft.com/en-us/library/8k5611at.aspx http://msdn.microsoft.com/zh-CN/library/8k5611at.aspx

The .NET Framework supports text elements. .NET Framework支持文本元素。 A text element is a unit of text that is displayed as a single character, called a grapheme. 文本元素是显示为单个字符(称为字素)的文本单元。 A text element can be a base character, a surrogate pair, or a combining character sequence. 文本元素可以是基本字符,代理对或组合字符序列。 The StringInfo class provides methods that allow your application to split a string into its text elements and iterate through the text elements. StringInfo类提供了一些方法,这些方法使您的应用程序可以将字符串拆分为文本元素并遍历文本元素。 For an example of using the StringInfo class, see String Indexing. 有关使用StringInfo类的示例,请参见字符串索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM