正則表達式要刪除所有具有通用語言支持的非字母數字字符嗎？

Question

我想使用Pattern的compile方法來做到這一點。 如

String text = "Where? What is that, an animal? No! It is a plane.";
Pattern p = new Pattern("*some regex here*");
String delim = p.matcher(text).replaceAll("");

能做我想完成的事情的正則表達式是什么？

字符串示例：

英語

Input: "Where? What is that, an animal? No! It is a plane."
Output: "Where What is that an animal No It is a plane"

西班牙文

Input: "¿Dónde? ¿Qué es eso, un animal? ¡No! Es un avión."
Output: "Dónde Qué es eso un animal No Es un avión"

葡萄牙語

Input: "Onde? O que é isso, um animal? Não! É um avião."
Output: "Onde O que é isso um animal Não É um avião"

希望這些示例可以清楚說明我要完成的工作。 謝謝大家！

Answer 1

我不是世界上所有語言的專家，但是，可以通過特定語言來滿足您的要求：

Regex rgx = new Regex("[^a-zA-Z0-9 <put language specific characters to preserve here>]");
str = rgx.Replace(str, "");

我說英語和韓語，可以告訴您，韓語的標點符號與英語的標點符號相同。 如上所述，您可以添加應保留的字符，而不是特定語言的標點符號。 例如，假設不應該將波浪號視為標點符號。 然后使用正則表達式：

[^a-zA-Z0-9 ~]

Answer 2

Java Pattern類是Java的regex實現，支持Unicode類別，例如\\p{Lu} 。 由於您需要字母數字，因此將是類別 L （字母）和N （數字）。

由於您的示例顯示您還希望保留空格，因此需要包括該空格。 讓我們使用預定義字符類 \\s ，這樣您還可以保留換行符和制表符。

要查找指定字符以外的任何字符，請使用否定字符類： [^abc]

總而言之，這意味着[^\\s\\p{L}\\p{N}] ：

String output = input.replaceAll("[^\\s\\p{L}\\p{N}]+", "");

Where What is that an animal No It is a plane
Dónde Qué es eso un animal No Es un avión
Onde O que é isso um animal Não É um avião

或訪問regex101.com進行演示。

當然，有多種方法可以做到這一點。

您也可以使用POSIX字符類 \\p{Alnum} ，然后使用(?U)啟用UNICODE_CHARACTER_CLASS 。

String output = input.replaceAll("(?U)[^\\s\\p{Alnum}]+", "");

Where What is that an animal No It is a plane
Dónde Qué es eso un animal No Es un avión
Onde O que é isso um animal Não É um avião

現在，如果您不想使用空格，可以使用\\P{xx}來簡化：

String output = input.replaceAll("(?U)\\P{Alnum}+", "");

WhereWhatisthatananimalNoItisaplane
DóndeQuéesesounanimalNoEsunavión
OndeOqueéissoumanimalNãoÉumavião

正則表達式要刪除所有具有通用語言支持的非字母數字字符嗎？

問題描述

2 個解決方案

解決方案1
1 2017-07-11 02:06:07

解決方案2
1 已采納 2017-07-11 03:09:52

正則表達式要刪除所有具有通用語言支持的非字母數字字符嗎？

問題描述

2 個解決方案

解決方案1 1 2017-07-11 02:06:07

解決方案2 1 已采納 2017-07-11 03:09:52

解決方案1
1 2017-07-11 02:06:07

解決方案2
1 已采納 2017-07-11 03:09:52