简体   繁体   中英

c# regex.replace need to replace 3 or more spaces

My input example text file

92721662,5819.53,2019 - 10 - 10,04332977,5938.30,.00,118.77 -

92721664,5510.56,2019 - 10 - 10,04332978,5623.02,.00,112.46 -

92730321,22805.90,2019 - 10 - 15,04354360,23350.20,.00,544.30 -

The last regex I have tried is:

var requestbody3 = Regex.Replace(requestbody2, @" { 3 ,}[\r\n]", "");

Where requestbody2 is the result of File.ReadAllText() from "testinput.txt" file

The goal is to remove only the blank lines containing 3 or more spaces ending with \\r\\n leaving individual lines without gaps between them.

You can avoid Regex entirely for this, which I highly suggest.

Instead of reading your file as a giant string, get the lines using the built in method: File.ReadLines() . Then to remove blank lines you just use LINQ.

So all together your code should just be:

IEnumerable<string> lines = File.ReadLines("testinput.txt").Where(!string.IsNullOrWhiteSpace);

The crux of your problem is that the regex contains extraneous white space and isn't behaving as a "three or more" quantifier. Simply don't put spaces inside the curly brackets:

//three or more spaces followed by windows or unix newline
" {3,}\r?\n"

Consider also:

  • use \\s instead ofto match a space
  • don't put [\\r\\n] because it means "one of CR or LF" so if your file has CRLF it will match the CR and remove it but not the LF and your file will still have new lines but be corrupt/mixed line endings. The correct Regex would be to match 0 or 1 CR followed by 1 LF
  • per Pluto's comment, you could start your regex with a caret, to prevent matching lines that contain some text and then end with 3 or more spaces: ^\\s{3,}\\r?\\n - note that you'll also need to enable Multiline regexoption so that the regex engine treats every line of text as a separate input - right now it's treating the entire input as one string so ^ only applies to the start of the file not the start of each line
  • alternatively you can use a positive look behind to ensure that only sequences of spaces preceded by a newline character are matched. The preceding newline is not made part of the match so it doesn't get replaced: (?<=\\n)\\s{3,}\\r?\\n . The downside of this is that it can't match the very first line of the file, so we need yet another extension, to say "match the start of input or a newline, followed by 3+ spaces, followed by CR/CRLF" which is: (^|(?<=\\n))\\s{3,}\\r?\\n

Overkill, but a nice learning journey. Maybe consider using one of the routes suggested that doesn't use regex :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM