简体   繁体   中英

Get Removed characters from string

I am using Regex to remove unwanted characters from string like below:

str = System.Text.RegularExpressions.Regex.Replace(str, @"[^\u0020-\u007E]", "");

How can I retrieve distinct characters which will be removed in efficient way?

EDIT:

Sample input  : str         = "This☺ contains Åüsome æspecialæ characters"
Sample output : str         = "This contains some special characters"
                removedchar = "☺,Å,ü,æ"
string pattern = @"[\u0020-\u007E]";
Regex rgx = new Regex(pattern);
List<string> matches = new List<string> ();

foreach (Match match in rgx.Matches(str))
{
    if (!matches.Contains (match.Value))
    {
        matches.Add (match.Value);
    }
}

Here is an example how you can do it with a callback method inside the Regex.Replace overload with an evaluator :

evaluator
Type: System.Text.RegularExpressions.MatchEvaluator
A custom method that examines each match and returns either the original matched string or a replacement string.

C# demo:

using System;
using System.Collections.Generic;
using System.Text.RegularExpressions;

public class Test
{
    public static List<string> characters = new List<string>();
    public static void Main()
    {
        var str = Regex.Replace("§My string 123”˝", "[^\u0020-\u007E]", Repl);//""
        Console.WriteLine(str); // => My string 123
        Console.WriteLine(string.Join(", ", characters)); // => §, ”, ˝
    }

    public static string Repl(Match m)
    {
        characters.Add(m.Value);
        return string.Empty;
    }
}

See IDEONE demo

In short, declare a "global" variable (a list of strings, here, characters ), initialize it. Add the Repl method to handle the replacement, and when Regex.Replace calls that method, add each matched value to the characters list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM