简体   繁体   English

如何使用.NET Regex Library匹配并删除反斜杠“\\”和“\\ n”字符?

[英]How can I match and remove the backslash “\” and “\n” character using the .NET Regex Library?

I get the XML from a web service in the format below and I want to clean it up (remove the extra "\\" and "\\n" characters) before working with it. 我从下面的格式的Web服务中获取XML,并且我希望在使用之前清除它(删除额外的“\\”和“\\ n”字符)。 I am currently using the regular expression below to match. 我目前正在使用下面的正则表达式进行匹配。 However only the "\\n" characters are cleaned up, while the "\\" characters which are in between equal and double quotation marks persist. 但是只清除了“\\ n”字符,而等号和双引号之间的“\\”字符仍然存在。

What do you advise me to do? 你建议我做什么?

private string ValidateXml(string dirtyXml) {
    Regex regex = new Regex(@"[\\\][\n]");
    var cleanXml = regex.Replace(dirtyXml, "");
    return cleanXml;
}

"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n\n<ISBNdb server_time=\"2010-01-28T11:31:08Z\">\n<BookList total_results=\"1\" page_size=\"10\" page_number=\"1\" shown_results=\"1\">\n<BookData book_id=\"quantitative_techniques\" isbn=\"0826458548\" isbn13=\"9780826458544\">\n<Title>Quantitative techniques</Title>\n<TitleLong></TitleLong>\n<AuthorsText>Terry Lucey</AuthorsText>\n<PublisherText publisher_id=\"continuum\">London : Continuum, 2002.</PublisherText>\n</BookData>\n</BookList>\n</ISBNdb>\n"

The question still isn't clear: if you write the XML string (before you try to clean it) to the console, do you see exactly what you posted above, with all those \\" and \\n sequences? Does the displayed string start and end with a quotation mark? If so, you probably want to remove the opening and closing quotation marks and all the backslashes, and if any backslash is followed by an 'n', you want to remove that as well. Here's some code to demonstrate: 问题仍然不明确:如果您将XML字符串(在尝试清理之前)写入控制台,您是否看到上面发布的内容,以及所有那些\\"\\n序列?显示的字符串是否开始并以引号结束?如果是这样,你可能想要删除开始和结束的引号以及所有的反斜杠,如果有任何反斜杠后跟'n',你也想删除它。这里有一些代码演示:

static void Main(string[] args)
{
  string dirtyXml = @"""<?xml version=\""1.0\"" encoding=\""UTF-8\""?>\n\n<ISBNdb server_time=\""2010-01-28T11:31:08Z\"">\n<BookList total_results=\""1\"" page_size=\""10\"" page_number=\""1\"" shown_results=\""1\"">\n<BookData book_id=\""quantitative_techniques\"" isbn=\""0826458548\"" isbn13=\""9780826458544\"">\n<Title>Quantitative techniques</Title>\n<TitleLong></TitleLong>\n<AuthorsText>Terry Lucey</AuthorsText>\n<PublisherText publisher_id=\""continuum\"">London : Continuum, 2002.</PublisherText>\n</BookData>\n</BookList>\n</ISBNdb>\n""";
  Console.WriteLine(dirtyXml);
  Console.WriteLine();
  Console.WriteLine(Regex.Replace(dirtyXml, @"^""|""$|\\n?", ""));
}

output: 输出:

"<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>\\n\\n<ISBNdb server_time=\\"2010-01-28T11:31:08Z\\">\\n<BookList total_results=\\"1\\" page_size=\\"10\\" page_number=\\"1\\" shown_results=\\"1\\">\\n<BookData book_id=\\"quantitative_techniques\\" isbn=\\"0826458548\\" isbn13=\\"9780826458544\\">\\n<Title>Quantitative techniques</Title>\\n<TitleLong></TitleLong>\\n<AuthorsText>Terry Lucey</AuthorsText>\\n<PublisherText publisher_id=\\"continuum\\">London : Continuum, 2002.</PublisherText>\\n</BookData>\\n</BookList>\\n</ISBNdb>\\n"

<?xml version="1.0" encoding="UTF-8"?><ISBNdb server_time="2010-01-28T11:31:08Z"><BookList total_results="1" page_size="10" page_number="1" shown_results="1"><BookData book_id="quantitative_techniques" isbn="0826458548" isbn13="9780826458544"><Title>Quantitative techniques</Title><TitleLong></TitleLong><AuthorsText>Terry Lucey</AuthorsText><PublisherText publisher_id="continuum">London : Continuum, 2002.</PublisherText></BookData></BookList></ISBNdb>

Does this accurately reflect what you're starting with and what you want to end up with? 这是否准确地反映了您的开始以及最终想要达到的目标?

Your regex is a bit odd, it will match the following: 你的正则表达式有点奇怪,它将匹配以下内容:

  • \\\\ single backslash character \\\\单反斜杠字符
  • \\[ single [ character \\[单身[字符
  • ] single ] character ]单一]字符
  • \\n newline character \\n换行符

The following regex will match what you described: 以下正则表达式将与您描述的匹配:

@"\\n?"

It matches either literal \\n or \\ . 它匹配文字\\n\\ Note that the backslash will match even when it is not followed by quote. 请注意,即使没有引号,反斜杠也会匹配。 To match only the backslashes followed by a quote, you can use this pattern: 要仅匹配反斜杠后跟引号,您可以使用此模式:

@"(\\n)|(\\(?=""))"

You don't really need a regex for this, you can just use a couple of calls to String.Replace. 你真的不需要一个正则表达式,你可以使用一些String.Replace调用。

This should do the trick: 这应该做的伎俩:

var cleanXml = dirtyXml.Replace("\\n", "").Replace("\\\"", "\"");

It looks like you want an | 看起来你想要一个| in that code to say match either \\n or \\ 在该代码中说匹配\\ n或\\

Try this 试试这个

[\\][n]|[\\]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM