简体   繁体   中英

Regex, comma and space handling

I require the following

  • Remove multiple spaces (replace with 1)
  • Remove multiple commas (replace with 1)
  • Trim all commas and spaces from the start and end
  • Remove all spaces preceding a comma
  • Always have 1 space after a comma
  • Always have only 1 comma and space together

Put simply, this is just a basic sentence tidy (in regards to commas and spaces).

The current solution I have is working, though I'm wondering if there is a way to reduce seemingly redundant steps with smarter "regex" expressions

Current solution

[TestCase(" , aaa,bbb ,, , ccc, ddd,,  eee   fff , , ggg , hhh ,", ExpectedResult = "aaa, bbb, ccc, ddd, eee fff, ggg, hhh")]
[TestCase(",, aaa,bbb ,, , ccc, ddd,,  eee   fff , , ggg , hhh ,, ", ExpectedResult = "aaa, bbb, ccc, ddd, eee fff, ggg, hhh")]
[TestCase(",,  ,,", ExpectedResult = "")]
public string CleanSentence(string source)
   var duplicateSpaces = new Regex(@"[ ]{2,}", RegexOptions.None);
   var spacesBeforeCommas = new Regex(@"\s+(?=,)", RegexOptions.None);
   var duplicateCommas = new Regex(@"[,]{2,}", RegexOptions.None);
   var loneComma = new Regex(@",(?=[^\s])", RegexOptions.None);
   var multiCommaAndSpace = new Regex(@"(, ){2,}", RegexOptions.None);

   source = duplicateSpaces.Replace(source, " ");
   source = duplicateCommas.Replace(source, ",");
   source = spacesBeforeCommas.Replace(source, "");
   source = loneComma.Replace(source, ", ");
   source = multiCommaAndSpace.Replace(source, ", ");

   //Trim the crud 
   source = source.Trim(',', ' ');

   return source;

Test cases

var test1 = " , aaa,bbb ,, , ccc, ddd,,  eee   fff , , ggg , hhh ," 
var test2 = ",, aaa,bbb ,, , ccc, ddd,,  eee   fff , , ggg , hhh ,, " 
var test3 = ",,  ,," 

Intended results

var Result1 = "aaa, bbb, ccc, ddd, eee fff, ggg, hhh" 
var Result2 = "aaa, bbb, ccc, ddd, eee fff, ggg, hhh" 
var Result3 = "" 

Though I'm wondering if there is a way to remove a couple of redundant steps.

Note : this is a quantifiable question, namely to reduce the steps involved with smarter regex expressions.

I have another solution just by using only string built-in function and a little Regex.Replace .

public string CleanString(string rawString)
    if (string.IsNullOrWhiteSpace(rawString)) return rawString;

    rawString = Regex.Replace(rawString, @"\s+", " ");
    rawString = Regex.Replace(rawString, @"(?<=,)\s+|\s+(?=,)", "");
    return string.Join(", ", rawString.Trim().Split(new[] { ',' }, StringSplitOptions.RemoveEmptyEntries)).Trim();


I would suggest the following:

  1. Reduce the number of spaces before a word: \\s+\\b with \\s
  2. Take care of your lone commas.
  3. Single out the remaining commas: [,\\s]*, with ,

This will also remove spaces at the end of each string.

Hope this helps.

I managed to get it down to with some inspiration from John Woo

 source = Regex.Replace(source, "[ ]{2,}", " ");
 source = Regex.Replace(source, "[, ]*,[, ]*", ", ");
 return source.Trim(',', ' '); 
  • Remove double spaces
  • Remove all comma and space where there is a least 1 comma
  • and the Trim to take care of start and finish

Seems like just splitting by space and comma should be enough:

public string CleanSentence(string source)
   return string.Join(", ", (source ?? "").Split(new[] { ',', ' ' }, StringSplitOptions.RemoveEmptyEntries));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM