简体   繁体   中英

C# Remove unwanted characters from a string

I have looked into other posts, and all of them have known unwanted characters. In my case, I have a bunch of characters that I want, and I only want to keep those.

My code is way too messy:

private string RemoveUnwantedChar(string input)
{
    string correctString = "";

    for (int i = 0; i < input.Length; i++)
    {
        if (char.IsDigit(input[i]) || input[i] == '.' || input[i] == '-' || input[i] == 'n'
                || input[i] == 'u' || input[i] == 'm' || input[i] == 'k' || input[i] == 'M'
                || input[i] == 'G' || input[i] == 'H' || input[i] == 'z' || input[i] == 'V'
                || input[i] == 's' || input[i] == '%')
            correctString += input[i];
    }
    return correctString;
}

Characters that I want: 0123456789 and numkMGHzVs%-.

You can use LINQ:

var allowedChars = "0123456789numkMGHzVs";
var result = String.Join("", input.Where(c => allowedChars.Any(x => x == c)));

Another option:

var result = String.Join("", str.Where(c => allowedChars.Contains(c)));

You can use String.Concat + Enumerable.Where with HashSet<T>.Contains :

HashSet<char> AllowedChars = new HashSet<char>("0123456789numkMGHzVs%-.");
private string RemoveUnwantedChar(string input)
{
    return string.Concat(input.Where(AllowedChars.Contains));
}

Here's another efficient aproach using a StringBuilder and a HashSet<T> :

HashSet<char> AllowedChars = new HashSet<char>("0123456789numkMGHzVs%-.");
private string RemoveUnwantedChar(string input)
{
    StringBuilder sb = new StringBuilder(input.Length);
    foreach (char c in input)
        if (AllowedChars.Contains(c))
            sb.Append(c);
    return sb.ToString();
}

You could do something like this:

// create a lookup hashset
private static HashSet<char> _allowedChars = new HashSet<char>("0123456789numkMGHzVs%-.".ToArray());

private string FilterString(string str)
{
    // tempbuffer
    char[] buffer = new char[str.Length];
    int index = 0;

    // check each character
    foreach (var ch in str)
        if (_allowedChars.Contains(ch))
            buffer[index++] = ch;

    // return the new string.
    return new String(buffer, 0, index);
}

So the trick is, create a hashset to validate each character. The 'messy' way, like you said, is creating new strings and will fragement memory. Also try to avoid many nested if statements. (like you want to avoid)


If you like linq, you could do something like:

// create a lookup hashset
private static HashSet<char> _allowedChars = new HashSet<char>("0123456789numkMGHzVs%-.".ToArray());

private string FilterString2(string str)
{
    return new String(
        str.Where(ch => _allowedChars.Contains(ch)).ToArray());
}

But this will make it less readable..

If you are using LINQ you could do this:

char[] validChars = "0123456789numkMGHzVs%-.".ToArray();
var newString = "Teststring012";

string filtered = string.Join("", newString.Where(x => validChars.Contains(x)));

I like this clear and readable Regex solution.

public string RemoveUnwantedChar(string input) {
    return Regex.Replace(input, "[^0-9numkMGHzVs%\\-.]", "");
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM