How to remove non-ASCII word from a string in C#

Question

I want to filter some string which has some wrong letters (non- ASCII ). It looks different in Notepad, Visual Studio 2010 and MySQL.

How can I check if a string has non-ASCII letters and how I can remove them?

Answer 1

You could use a regular expression to filter non ASCII characters:

string input = "AB £ CD";
string result = Regex.Replace(input, "[^\x0d\x0a\x20-\x7e\t]", "");

Answer 2

You could use Regular Expressions.

Regex.Replace(input, "[^a-zA-Z0-9]+", "")

You could also use \\W+ as the pattern to remove any non-character.

Answer 3

This has been a God-send:

Regex.Replace(input, @"[^\u0000-\u007F]", "");

I think I got it elsewhere originally, but here is a link to the same answer here:

How can you strip non-ASCII characters from a string? (in C#)

Answer 4

First, you need to determine what you mean by a "word". If non-ascii, this probably implies non-english?

Personally, I'd ask why you need to do this and what fundamental assumption has your application got that conflicts with your data? Depending on the situation, I suggest you either re-encode the text from the source encoding, although this will be a lossy conversion, or alternatively, address that fundamental assumption so that your application handles data correctly.

Answer 5

I think something as simple as this would probably work, wouldn't it?

public static string AsciiOnly(this string input, bool includeExtendedAscii)
{
    int upperLimit = includeExtendedAscii ? 255 : 127;
    char[] asciiChars = input.Where(c => (int)c <= upperLimit).ToArray();
    return new string(asciiChars);
}

Example usage:

string input = "AB£ȼCD";
string asciiOnly = input.AsciiOnly(false); // returns "ABCD"
string extendedAsciiOnly = input.AsciiOnly(true); // returns "AB£CD"

How to remove non-ASCII word from a string in C#

Question

5 answers

solution1
4 ACCPTED 2010-09-13 08:49:25

solution2
1 2010-09-13 08:47:49

solution3
0 2013-03-05 20:33:49

solution4
0 2010-10-25 01:06:55

solution5
-1 2010-09-13 09:32:20

How to remove non-ASCII word from a string in C#

Question

5 answers

solution1 4 ACCPTED 2010-09-13 08:49:25

solution2 1 2010-09-13 08:47:49

solution3 0 2013-03-05 20:33:49

solution4 0 2010-10-25 01:06:55

solution5 -1 2010-09-13 09:32:20

solution1
4 ACCPTED 2010-09-13 08:49:25

solution2
1 2010-09-13 08:47:49

solution3
0 2013-03-05 20:33:49

solution4
0 2010-10-25 01:06:55

solution5
-1 2010-09-13 09:32:20