如何用正則表達式C＃替換空格（將Unicode轉換為utf-8）

Question

我正在嘗試在C＃中執行替換正則表達式。 我嘗試編寫的方法用UTF-8中的普通空格替換了一些Unicode字符（空格）。

讓我用代碼解釋。 我不好寫正則表達式，文化信息和正則表達式。

    //This method replace white spaces in unicode by whitespaces UTF-8
    public static string cleanUnicodeSpaces(string value)
    {
        //This first pattern works but, remove other special characteres
        //For example: mark accents
        //string pattern = @"[^\u0000-\u007F]+"; 
        string cleaned = ""; 
        string pattern = @"[^\u0020\u0009\u000D]+"; //Unicode characters
        string replacement = ""; //Replace by UTF-8 space
        Regex regex = new Regex(pattern);
        cleaned = regex.Replace(value, replacement).Trim(); //Trim by quit spaces
        return cleaned;
    }

Unicode空格

HT：U + 0009 =字符列表
LF：U + 000A =換行
CR：U + 000D =回車

我做錯了什么？

資源

Unicode字符： https ： //unicode-table.com/en
空白： https ： //en.wikipedia.org/wiki/Whitespace_character
正則表達式： https ://msdn.microsoft.com/es-es/library/system.text.regularexpressions.regex( v= vs.110).aspx

解決方案感謝@wiktor-stribiżew和@ mathias-r-jessen，解決方案：

 string pattern = @"[\u0020\u0009\u000D\u00A0]+";
 //I include \u00A0 for replace &nbsp

Answer 1

您的正則表達式- [^\ \ \ ]+ -是與常規空格（ \ ），制表符（ \ ）和回車符（ \ ）以外的任何1+個字符相匹配的否定字符類 。 您實際上正在尋找一個正字符類，該字符類與您指定的三個字符（換行符為\\x0A ，換行符為\\x0D ，制表符為\\x09 ）之一匹配，且帶有常規空格（ \\x20 ）。

你可以只用

var res = Regex.Replace(s, @"[\x0A\x0D\x09]", " ");

見正則表達式演示

如何用正則表達式C＃替換空格（將Unicode轉換為utf-8）

問題描述

1 個解決方案

解決方案1
4 已采納 2017-09-04 21:47:44

如何用正則表達式C＃替換空格（將Unicode轉換為utf-8）

問題描述

1 個解決方案

解決方案1 4 已采納 2017-09-04 21:47:44

解決方案1
4 已采納 2017-09-04 21:47:44