简体   繁体   中英

How to remove multiple, repeating & unnecessary punctuation from string in C#?

Considering strings like this:

"This is a string....!"
"This is another...!!"
"What is this..!?!?"
...
// There are LOTS of examples of weird/angry sentence-endings like the ones above.

I want to replace the unnecessary punctuation at the end to make it look like this:

"This is a string!"
"This is another!"
"What is this?"

What I basically do is: - split by space - check if last char in string contains a punctuation - start replacing with the patterns below

I have tried a very big ".Replace(string, string)" function, but it does not work - there has to be a simpler regex I guess.

Documentation:

Returns a new string in which all occurrences of a specified string in the current instance are replaced with another specified string.

As well as:

Because this method returns the modified string, you can chain together successive calls to the Replace method to perform multiple replacements on the original string.

Anything is wrong here.

EDIT: ALL the proposed solutions work fine! Thank you very much! This one was the best suited solution for my project:

Regex re = new Regex("[.?!]*(?=[.?!]$)");
string output = re.Replace(input, "");

Your solution works almost fine ( demo ), the only issue is when the same sequence could be matched starting at different spots. For example, ..!?!? from your last line is not part of the substitution list, so ..!? and !? get replaced by two separate matches, producing ?? in the output.

It looks like your strategy is pretty straightforward: in a chain of multiple punctuation characters the last character wins. You can use regular expressions to do the replacement:

[!?.]*([!?.])

and replace it with $1 , ie the capturing group that has the last character:

string s;
while ((s = Console.ReadLine()) != null) {
    s = Regex.Replace(s, "[!?.]*([!?.])", "$1");
    Console.WriteLine(s);
}

Demo

Simply

[.?!]*(?=[.?!]$)

should do it for you. Like

Regex re = new Regex("[.?!]*(?=[.?!]$)");
Console.WriteLine(re.Replace("This is a string....!", ""));

This replaces all punctuations but the last with nothing.

[.?!]* matches any number of consecutive punctuation characters, and the (?=[.?!]$) is a positive lookahead making sure it leaves one at the end of the string.

See it here at ideone .

Or you can do it without regExps:

    string TrimPuncMarks(string str)
    {
        HashSet<char> punctMarks = new HashSet<char>() {'.', '!', '?'};

        int i = str.Length - 1;
        for (; i >= 0; i--)
        {
            if (!punctMarks.Contains(str[i]))
                break;
        }

        // the very last punct mark or null if there were no any punct marks in the end
        char? suffix = i < str.Length - 1 ? str[str.Length - 1] : (char?)null;

        return str.Substring(0, i+1) + suffix;
    }

    Debug.Assert("What is this?" == TrimPuncMarks("What is this..!?!?"));
    Debug.Assert("What is this" == TrimPuncMarks("What is this"));
    Debug.Assert("What is this." == TrimPuncMarks("What is this."));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM