简体繁体 English

c＃：如何将Unicode字符转换为其ASCII等效字符

[英]c#: How to convert a Unicode character to its ASCII equivalent

原文 2011-02-28 10:22:08 6 2 c#/ unicode/ ascii

I know its a recurrent question here but no one of answers havent work for me. 我知道这是一个经常出现的问题，但没有一个答案对我不起作用。

From a system I'm receiving a Unicode text. 从一个系统我收到一个Unicode文本。 Just an email + name from customers. 只是来自客户的电子邮件+名称。

When I record these strings to my SQL DB the appears some chars appears with \\u.\u003c/i> 当我将这些字符串记录到我的SQL DB时，会出现一些字符，显示为\\ u。

For example the emails are getting in the DB: name\@domain.com 例如，电子邮件进入DB：name \\ u0040domain.com

How I transform the Unicode string in my c# program to ascii, so the DB gets name@domain.com. 我如何将我的c＃程序中的Unicode字符串转换为ascii，因此数据库获取name@domain.com。

Also that replace special chars to equivalent or to no one... For example "Hernán π" to "Hernan " 也取代特殊的字符等同于或没有人...例如“Hernánπ”到“Hernan”

Thanks! 谢谢！

2 个解决方案

IMHO converting Unicode back to ASCII for some dubious storage or technical benefit isn't a good idea in the 21st century, especially since email is being changed to support Unicode in headers and bodies. 在21世纪，恕我直言将Unicode转换回ASCII以获得某些可疑的存储或技术优势并不是一个好主意，尤其是因为电子邮件被更改为支持头文件和正文中的Unicode。

http://en.wikipedia.org/wiki/Unicode_and_e-mail http://en.wikipedia.org/wiki/Unicode_and_e-mail

If the reason why you want to convert Hernán to Hernan is for searching, you should look at using an Accent Insensitive (AI) collation on your database, or coerce it to do so - see this SO post . 如果您想将Hernán转换为Hernan的原因是为了搜索，您应该在数据库中使用Accent Insensitive（AI）排序规则，或者强制它进行搜索 - 请参阅此SO帖子。

One thing you might need to double check however is that your strings aren't getting preencoded before storage in your database (assuming that your DB column is set to accept unicode - ie NVARCHAR etc), the character '@' should be stored as '@' (0040 in UTF 16) and not as '\@'. 然而，您可能需要仔细检查的一件事是您的字符串在数据库中存储之前没有得到预编码（假设您的数据库列设置为接受unicode - 即NVARCHAR等），字符'@'应存储为' @'（UTF 16中的0040）而不是'\\ u0040'。

EDIT: The "\\uNNNN" encoding in a string might originate from Java or Python. 编辑：字符串中的“\\ uNNNN”编码可能源自Java或Python。 You might be able to trace the email string data up your architecture to find the source of this encoding and change it to something more easy to decode in C# such as UTF-8. 您可能能够在架构中跟踪电子邮件字符串数据，以找到此编码的来源，并将其更改为更容易在C＃中解码的内容，例如UTF-8。

How do I treat an ASCII string as unicode and unescape the escaped characters in it in python? 如何将ASCII字符串视为unicode并在python中对其中的转义字符进行转换？

You can use Encoding.Convert for such operations. 您可以使用Encoding.Convert进行此类操作。 Read about this on MSDN 在MSDN上阅读此内容