简体   繁体   中英

How to correctly decode JSON data with unicode strings on it

I'm reading a json file where some fields have string like the following: "Eduardo Fonseca Bola\Ã\±os comparti\Ã\³ una publicaci\Ã\³n."

The final end reslt should look like this "Eduardo Fonseca Bolaños compartió una publicación."

  • Is there any out of the box converted to do this using C#?
  • Which is the correct way to convert these kinds of json data?

You can use Json.NET library to decode the string. The deserializer decodes the string automatically.

public class Example
{
    public String Name { get; set; }
}
// 
var i = @"{ ""Name"" : ""Eduardo Fonseca Bola\u00c3\u00b1os comparti\u00c3\u00b3 una publicaci\u00c3\u00b3n."" }";
var jsonConverter = Newtonsoft.Json.JsonConvert.DeserializeObject(i);

// Encode the string to UTF8
byte[] bytes = Encoding.Default.GetBytes(jsonConverter.ToString());
var myString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(myString);

// Deserialize using class
var sample = Newtonsoft.Json.JsonConvert.DeserializeObject<Example>(i);
byte[] bytes = Encoding.Default.GetBytes(sample.Name);
var myString = Encoding.UTF8.GetString(bytes);
Console.WriteLine(myString);

The output is:

{
  "Name": "Eduardo Fonseca Bolaños compartió una publicación."
}

Option 2

You can use System.Web.Helpers.Json.Decode method. You won't need to use any external libraries.

Here is the fix for this specific situation

        private static Regex _regex = 
        new Regex(@"(\\u(?<Value>[a-zA-Z0-9]{4}))+", RegexOptions.Compiled);
    private static string ConvertUnicodeEscapeSequencetoUTF8Characters(string sourceContent)
    {
        //Check https://stackoverflow.com/questions/9738282/replace-unicode-escape-sequences-in-a-string
        return _regex.Replace(
            sourceContent, m =>
            {
                var urlEncoded = m.Groups[0].Value.Replace(@"\u00", "%");
                var urlDecoded = System.Web.HttpUtility.UrlDecode(urlEncoded);
                return urlDecoded;
            }
        );
    }

Based on Replace unicode escape sequences in a string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM