如何刪除字符串中的unicode字符

Question

假設我們有一個像下面這樣的字符串。

string s = "此檢查項己被你忽略，請聯系醫生。\u2028內科";

如何刪除字符串中像\ 這樣的 unicode 字符？

我試過下面的功能。 不幸的是，它們都不起作用。 請救救我。 謝謝。

Unicode 字符串

將 Unicode 字符串轉換為轉義的 ASCII 字符串

替換字符串中的 unicode 轉義序列

更新

為什么下面的代碼對我不起作用？

更新我試圖在輸出中顯示字符串。 它是一個行分隔符。

Answer 1

正如@spender 在上面的評論中所指出的：

您的問題（刪除 unicode）的基本前提已被破壞，因為所有字符串都以 unicode 形式存儲在內存中。 所有字符都是Unicode。

但是，如果您想替換/刪除格式為"\\uXXXX"的非轉義字符串，則可以使用類似以下正則表達式模式的內容： @"\\\\u[0-9A-Fa-f]{4}"

這是一個完整的例子：

string noUnicode = "此檢查項己被你忽略，請聯系醫生。內科";

// If you hard-code the string, you MUST add an `@` before the string, otherwise,
// the "u2028" will get escaped and converted to its corresponding Unicode character.
string s = @"此檢查項己被你忽略，請聯系醫生。\u2028內科";
string ss = Regex.Replace(s, @"\\u[0-9A-Fa-f]{4}", string.Empty);

Debug.Print("s = " + s);
Debug.Print("ss = " + ss);

Debug.Print((ss == noUnicode).ToString());

這是一個要測試的小提琴，這是它的輸出：

注意：由於字符串是硬編碼的，所以這里必須使用@ ，以防止子字符串"\ "被轉換為對應的Unicode字符。 另一方面，如果您從其他地方獲取原始字符串（例如，從文本文件中讀取），則子字符串"\ "已經按原樣表示，應該沒有問題，上面的代碼應該可以工作正好。

所以，像這樣的事情會完全一樣：

string s = File.ReadAllText(@"Path\to\a\Unicode\text\file\containing\the\string\'\u2028'");
string ss = Regex.Replace(s, @"\\u[0-9A-Fa-f]{4}", string.Empty);

如何刪除字符串中的unicode字符

問題描述

1 個解決方案

解決方案1
1 2018-03-03 11:17:04

如何刪除字符串中的unicode字符

問題描述

1 個解決方案

解決方案1 1 2018-03-03 11:17:04

解決方案1
1 2018-03-03 11:17:04