简体   繁体   中英

C# REST Client - Encoding special characters in XML

I'm working on an application in C# which pulls user data from Active Directory (using DirectorySearcher) and posts them to a remote site using a REST API. But some names contain special characters such as ØÆÅ etc., and I can't figure out how to encode them properly. The API expects to receive them encoded as &230; etc.

The following is a test stub:

using System;
using System.Collections.Generic;
using System.Text;
using System.Xml;
using System.IO;

namespace Encodingtest
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlWriterSettings xws = new XmlWriterSettings();
            xws.Encoding = Encoding.UTF8;

            StringWriter sw = new StringWriter();
            using (XmlWriter xw = XmlWriter.Create(sw, xws))
            {
                xw.WriteStartElement("test");
                xw.WriteElementString("element", "test øæåØÆÅ");
                xw.WriteEndElement();
                xw.Flush();
                xw.Close();
            }
            Console.WriteLine(sw.ToString());
            Console.ReadLine();
        }
    }
}

The problem is that the output is still in the same format as the input. That is, readable danish characters and not their numeric entity.

The REST API is a Rails app btw. I assume that any data in the C# app is unicode by default.

Any help and hits are greatly appreciated.

Cheers

Any system processing XML should be able to handle UTF-8 character sets, especially if the encoding is explicitly declared as UTF-8. Those characters should not have to be encoded as numeric entity references.

If you want to ensure that those characters are serialized with numeric entities, then set your encoding to a smaller character set, like ascii or us-ascii .

In your code, change: xws.Encoding = Encoding.UTF8;

to: xws.Encoding = Encoding.ASCII;

Since those characters are outside of the ascii character-set, they will be serialized as numeric character entities.

Perhaps just resort to your own "numeric character reference" generator:

foreach (char c in "test øæåØÆÅ")
{
    string encoding = (int)c >= 0x80 ? String.Format("&{0};",(int)c) : c.ToString();  
    Console.Write(encoding);
}

The above code produces the output "test øæåØÆÅ" which matches that found with an online converter .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM