简体   繁体   English

如何对 XML 中的特殊字符进行编码

[英]How to encode special characters in XML

My string XML contains a whole series of special characters:我的字符串 XML 包含一系列特殊字符:

&
egrave;
&
rsquo;
&
rsquo;
&
rsquo;
&
ldquo;
&
rdquo;
&
rsquo
&
agrave;
&
agrave;

I need replace this special characters in insert string in DB and I tried use System.Net.WebUtility.HtmlEncode without success, can you help me?我需要在数据库中的插入字符串中替换这个特殊字符,我尝试使用System.Net.WebUtility.HtmlEncode没有成功,你能帮帮我吗?

string sql = "insert into rss (title, description, link, pubdate) values (?,?,?, " +
             " STR_TO_DATE(?, '%a, %d %b %Y %H:%i:%s GMT'));";

OdbcCommand command;
OdbcDataAdapter adpter = new OdbcDataAdapter();
connection.Open();
command = new OdbcCommand(sql, connection);
command.Parameters.AddWithValue("param1", System.Net.WebUtility.HtmlEncode(xmlTitle.InnerText.ToString()));
command.Parameters.AddWithValue("param2", System.Net.WebUtility.HtmlEncode(xmlDescription.InnerText.ToString()));
command.Parameters.AddWithValue("param3", System.Net.WebUtility.HtmlEncode(xmlLink.InnerText.ToString()));
command.Parameters.AddWithValue("param4", System.Net.WebUtility.HtmlEncode(xmlPubDate.InnerText.ToString()));
adpter.InsertCommand = command;
adpter.InsertCommand.ExecuteNonQuery();
connection.Close();

You can use a native .NET method for escaping special characters in text.您可以使用本机 .NET 方法来转义文本中的特殊字符。 Sure, there's only like 5 special characters, and 5 Replace() calls would probably do the trick, but I'm sure there's got to be something built-in.当然,只有 5 个特殊字符,并且 5 个 Replace() 调用可能可以解决问题,但我确信必须有内置的东西。

Example of converting "&" to "&""&"转换为"&"例子

To much relief, I've discovered a native method, hidden away in the bowels of the SecurityElement class.令我宽慰的是,我发现了一个隐藏在 SecurityElement 类内部的本机方法。 Yes, that's right - SecurityElement.Escape(string s) will escape your string and make it XML safe.是的,没错 - SecurityElement.Escape(string s) 将转义您的字符串并使其 XML 安全。

This is important, since if we are copying or writing data to Infopath Text fields, it needs to be first Escaped to non-Entity character like "&"这很重要,因为如果我们将数据复制或写入 Infopath Text 字段,则需要先将其转义为非实体字符,例如"&" . .

invalid XML Character to Replaced With要替换的无效 XML 字符

"<" to "&lt;"

">" to "&gt;"

"\\"" to "&quot;"

"'" to "&apos;"

"&" to "&amp;"

Namespace is "System.Security".命名空间是“System.Security”。 Refer : http://msdn2.microsoft.com/en-us/library/system.security.securityelement.escape(VS.80).aspx参考: http : //msdn2.microsoft.com/en-us/library/system.security.securityelement.escape(VS.80).aspx

The Other Option is to Customise code for另一个选项是自定义代码

public static string EscapeXml( this string s )
{
  string toxml = s;
  if ( !string.IsNullOrEmpty( toxml ) )
  {
    // replace literal values with entities
    toxml = toxml.Replace( "&", "&amp;" );
    toxml = toxml.Replace( "'", "&apos;" );
    toxml = toxml.Replace( "\"", "&quot;" );
    toxml = toxml.Replace( ">", "&gt;" );
    toxml = toxml.Replace( "<", "&lt;" );
  }
  return toxml;
}

public static string UnescapeXml( this string s )
{
  string unxml = s;
  if ( !string.IsNullOrEmpty( unxml ) )
  {
    // replace entities with literal values
    unxml = unxml.Replace( "&apos;", "'" );
    unxml = unxml.Replace( "&quot;", "\"" );
    unxml = unxml.Replace( "&gt;", ">" );
    unxml = unxml.Replace( "&lt;", "<" );
    unxml = unxml.Replace( "&amp;", "&" );
  }
  return unxml;
}

您可以使用 HttpUtility.HtmlDecode 或 .NET 4.0+ 您也可以使用 WebUtility.HtmlDecode

取而代之的System.Net.WebUtility.HtmlEncode你必须使用System.Net.WebUtility.HtmlDecode

There are 3 other ways this can be done from what you tried:根据您的尝试,还有其他 3 种方法可以做到这一点:

  1. Use string.Replace() 5 times使用 string.Replace() 5 次
  2. Use System.Web.HttpUtility.HtmlEncode()使用 System.Web.HttpUtility.HtmlEncode()
  3. System.Xml.XmlTextWriter System.Xml.XmlTextWriter

I could explain each case but I found this link to be mightily useful .我可以解释每个案例,但我发现这个链接非常有用

Statement toxml = toxml.Replace( "&", "&amp;" );

This has to be done first.这必须首先完成。 Otherwise, when calling this last will replace all the previous "&" (' or ") with &amps;否则,在调用此 last 时会将所有先前的“&”(' 或“)替换为 &amps;

Simple code:简单代码:

    public static string ToXmlStr(string value) => String.IsNullOrEmpty(value) ? "" : value.Replace("&", "&amp;").Replace("'", "&apos;").Replace("\"", "&quot;").Replace(">", "&gt;").Replace("<", "&lt;");

    public static string FromXmlStr(string xmlStr) => String.IsNullOrEmpty(xmlStr) ? "" : xmlStr.Replace("&apos;", "'").Replace("&quot;", "\"").Replace("&gt;", ">").Replace("&lt;", "<").Replace("&amp;", "&");

    public static string ToMultilineXmlStr(string value) => String.IsNullOrEmpty(value) ? "" :
        value.Replace("\r", "").Split('\n').Aggregate(new StringBuilder(), (s, n) => s.Append("<p>").Append(ToXmlStr(n)).Append("</p>\n")).ToString();

Please note: for multiline values in xml usualy yon need to incapsulate each line into <p> tag. So "<'&A'>\\n<'&B'>" => "<p>&lt;&amp;A;&gt;</p><p>&lt;&amp;B;&gt;</p>"请注意:对于 xml 中的多行值,通常需要将每一行封装到<p> tag. So "<'&A'>\\n<'&B'>" => "<p>&lt;&amp;A;&gt;</p><p>&lt;&amp;B;&gt;</p>" <p> tag. So "<'&A'>\\n<'&B'>" => "<p>&lt;&amp;A;&gt;</p><p>&lt;&amp;B;&gt;</p>"

You can use System.Xml.Linq.XElement to encode special characters in XML.您可以使用System.Xml.Linq.XElement对 XML 中的特殊字符进行编码。

Like this:像这样:

var val = "test&<";
var node = new XElement("Node");
node.Value = val ?? node.Value;
Console.WriteLine(node.ToString());

OUTPUT:输出:

"<Node>test&amp;&lt;</Node>" "<Node>test&amp;<</Node>"

The ready to use XML escape function for .NET 5+:即用型 XML 转义 function 用于 .NET 5+:

[return: NotNullIfNotNull(nameof(s))]
static string? XmlEscape(string? s)
{
    if (string.IsNullOrEmpty(s))
        return s;

    var node = new XElement("X") { Value = s };
    return node.ToString()[3..^4];
}

Usage example:使用示例:

Console.WriteLine(XmlEscape("Hello < & >"));

The produced output:生产的output:

Hello &lt; &amp; &gt;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM