简体   繁体   中英

Need to remove white space between two tags with Regex

I have a bit of XML that I would like to strip the outer white space from. As a preface: The output is not well formed xml, it's a propritary spec I am relegated to dealing with.

The sample is:

<mattext>
  <span>A</span>
  <span>more text</span>
 </mattext>

What I need is:

<mattext><span>A</span>
  <span>more text</span></mattext>

Where all white space between the opening <mattext> and the first bit of inner content is gone, and the same for the closing </mattext> .

I've tried:

var output = Regex.Replace(input, @"<mattext>*<", "<mattext>", 
             RegexOptions.Multiline);

But I'm not having any luck. Can anyone advise?

Thanks!

Try using:

var output = Regex.Replace(input, @"(?<=<mattext>)\s*|\s*(?=</mattext>)", "");

regex101 demo

(?<=<mattext>) is a positive lookbehind and makes sure there is <mattext> before the spaces and newlines.

(?=</mattext>) is a positive lookahead and makes sure there is </mattext> after the spaces and newlines.

var output = Regex.Replace(input, @"<mattext>\s*<", "<mattext><", RegexOptions.Multiline);

Similar to @Jerry's answer, with additional guard to ensure <mattext> is at start of input and </mattext> is at end.

Regex.Replace(input,
  @"(?:(?<=^\<mattext\>)[^\<]*)|(?:[^\>]*(?=\</mattext\>$))",
  string.Empty,
  RegexOptions.Multiline);

不是空格,是\\ r或\\ n甚至都是\\ r \\ n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM