正則表達式排除特殊字符Java

Question

我想編寫一個正則表達式以包括：字母，數字和空格，但是我想排除特殊字符，例如!'^+%&/()=?_-*£#$等。

我以為我可以對字母使用[a-zA-Z] ，對數字使用[0-9] ，對於空格字符可以使用\\S

[a-zA-Z0-9 \\ s]

但是我要清除的字符串可能包含é,ü,ğ,i,ç等字母。 我不希望這些字母被刪除。

是否可以編寫這樣的正則表達式？

Answer 1

對的，這是可能的。

\\p{L}匹配Unicode字母az和é，ü，ğ，i，ç等字母
\\d匹配一個數字（等於[0-9]）
\\s匹配空格，制表符，回車符，換行符，垂直制表符或換頁符

[\\p{L}\\d\\s]+應該匹配列表中存在的一個或多個字符

在這里，您可以看到一個示例：

https://regex101.com/r/uQmu7a/1

Answer 2

如果要使用非正則表達式方式進行操作，則可以使用Apache StringUtils.isAlphanumericSpace（String str）進行操作。

Answer 3

您可以采用其他方式。

注意-這兩個正則表達式必須與Unicode character class flag選項一起運行。

有兩種方法

使用alnum並保持在Ascii和Extended-Ascii范圍內。

請注意，此U+011F ğ LATIN SMALL LETTER G WITH BREVE在外面
以下正則表達式中的0-FF范圍，因此不會匹配。

(?:\\p{Alnum}(?<=[\\x{00}-\\x{FF}])|\\s)+

講解

 (?:
      \p{Alnum}                     # Any alpha numeric Unicode
      (?<= [\x{00}-\x{FF}] )        # In the  U+0 - U+0FF codepoint range
   |                              # or,
      \s                            # Whitespace
 )+

或者，您可以使用拉丁塊的/腳本並保持在alnum范圍內，以采用Latin類路線。

(?:[\\p{Block=Latin_1_Supplement}\\p{Block=Latin_Extended_A}\\p{Block=Latin_Extended_Additional}\\p{Block=Latin_Extended_B}\\p{Block=Latin_Extended_C}\\p{Block=Latin_Extended_D}\\p{Block=Basic_Latin}\\p{Script=Latin}](?<=\\p{Alnum})|\\s)+

展開式

 (?:
      [\p{Block=Latin_1_Supplement}\p{Block=Latin_Extended_A}\p{Block=Latin_Extended_Additional}\p{Block=Latin_Extended_B}\p{Block=Latin_Extended_C}\p{Block=Latin_Extended_D}\p{Block=Basic_Latin}\p{Script=Latin}]
      (?<= \p{Alnum} )
   |
      \s
 )+

正則表達式排除特殊字符Java

問題描述

3 個解決方案

解決方案1
2 已采納 2017-05-28 22:35:09

解決方案2
0 2017-05-28 22:43:17

解決方案3
0

正則表達式排除特殊字符Java

問題描述

3 個解決方案

解決方案1 2 已采納 2017-05-28 22:35:09

解決方案2 0 2017-05-28 22:43:17

解決方案3 0

解決方案1
2 已采納 2017-05-28 22:35:09

解決方案2
0 2017-05-28 22:43:17

解決方案3
0