简体   繁体   中英

c#: How to convert a Unicode character to its ASCII equivalent

I know its a recurrent question here but no one of answers havent work for me.

From a system I'm receiving a Unicode text. Just an email + name from customers.

When I record these strings to my SQL DB the appears some chars appears with \\u.\u003c/i>

For example the emails are getting in the DB: name\@domain.com

How I transform the Unicode string in my c# program to ascii, so the DB gets name@domain.com.

Also that replace special chars to equivalent or to no one... For example "Hernán π" to "Hernan "

Thanks!

IMHO converting Unicode back to ASCII for some dubious storage or technical benefit isn't a good idea in the 21st century, especially since email is being changed to support Unicode in headers and bodies.

http://en.wikipedia.org/wiki/Unicode_and_e-mail

If the reason why you want to convert Hernán to Hernan is for searching, you should look at using an Accent Insensitive (AI) collation on your database, or coerce it to do so - see this SO post .

One thing you might need to double check however is that your strings aren't getting preencoded before storage in your database (assuming that your DB column is set to accept unicode - ie NVARCHAR etc), the character '@' should be stored as '@' (0040 in UTF 16) and not as '\@'.

EDIT: The "\\uNNNN" encoding in a string might originate from Java or Python. You might be able to trace the email string data up your architecture to find the source of this encoding and change it to something more easy to decode in C# such as UTF-8.

How do I treat an ASCII string as unicode and unescape the escaped characters in it in python?

You can use Encoding.Convert for such operations. Read about this on MSDN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM