简体   繁体   中英

Invalid XML characters

I have a text file(UTF-8) file. Content of this file is extracted form rich text documents, it might be MS Word, PDF, HTML or any thing. I have to pass this content to a web service, but most of time it contain invalid characters like form feed or null. What happens now is when I pass the content of the file, containing invalid character, to the web service it throw exception (not a valid XML character).

As I found few characters that are not valid for XML but can I have a proper .NET function the clean the string and remove all invalid characters or can I have a list of Invalid characters for any authentic site.

Thanks for your help in advance.

http://java.net/jira/browse/JAXB-614

This link will help you for the set. The set of invalid XML characters are: '\', '\', '\', '\', '\', '\', '\', '\', '\', '\ ', '\ ', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\', '\￾', '\￿'

If it's important to send a file's content without any modification the best decision is to escape the content. If it's not, try to use XmlConvert.IsXmlChar method, it helps to check a character's correctness. Check this my answer for code samples.

Probably the best way is to encode the whole text in Base64 as example.

http://en.wikipedia.org/wiki/Base64

Regards,

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM