Remove unwanted unicode characters from string

Question

I have looked at quite a number of related SO posts pertaining to this. I have this malformed string that contains unicode characters which I want to strip away.

string testString = "\0\u0001\0\0\0����\u0001\0\0\0\0\0\0\0\u0011\u0001\0\0\0\u0004\0\0\0\u0006\u0002\0\0\0\u0005The\u0006\u0003\0\0\0\u0017boy\u0006\u0004\0\0\0\tKicked\u0006\u0005\0\0\0\u0013the Ball\v";

I would like the following output:

The boy kicked the Ball

How can I achieve this?

I have looked at the below (With not much success):

How can you strip non-ASCII characters from a string? (in C#)
Converting unicode characters (C#) Testing
How to Remove '\0' from a string in C#?
Removing unwanted character from column (SQL Server related so not relevant in my question)

Answer 1

testString = Regex.Replace(testString, @"[\-\\ -\\Ā-\]", "");

要么

testString = Regex.Replace(testString, @"[^\\t\\r\\n -~]", "");

Answer 2

public string ReturnCleanASCII(string s)
    {
        StringBuilder sb = new StringBuilder(s.Length);
        foreach (char c in s)
        {
            if ((int)c > 127) // you probably don't want 127 either
                continue;
            if ((int)c < 32)  // I bet you don't want control characters 
                continue;
            if (c == '%')
                continue;
            if (c == '?')
                continue;
            sb.Append(c);
        }

        
        return sb.ToString();
    }

Answer 3

Try this:

string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);

Hope it helps.

Answer 4

为什么不尝试删除 unicode 字符，而是提取所有 ASCII 字符：

var str = string.Join(" ",new Regex("[ -~]+").Matches(testString).Select(m=>m.Value));

Answer 5

我使用这个正则表达式来过滤掉文件名中的坏字符。

Regex.Replace(directory, "[^a-zA-Z0-9\\:_\- ]", "")

Remove unwanted unicode characters from string

Question

5 answers

solution1
1 ACCPTED 2020-06-26 04:29:48

solution2
1 2022-07-27 08:03:07

solution3
0 2020-06-26 04:14:53

solution4
0 2020-06-26 04:28:59

solution5
0 2020-06-26 04:34:02

Remove unwanted unicode characters from string

Question

5 answers

solution1 1 ACCPTED 2020-06-26 04:29:48

solution2 1 2022-07-27 08:03:07

solution3 0 2020-06-26 04:14:53

solution4 0 2020-06-26 04:28:59

solution5 0 2020-06-26 04:34:02

solution1
1 ACCPTED 2020-06-26 04:29:48

solution2
1 2022-07-27 08:03:07

solution3
0 2020-06-26 04:14:53

solution4
0 2020-06-26 04:28:59

solution5
0 2020-06-26 04:34:02