[英]Java Regex - Remove Non-Alphanumeric characters except line breaks
I'm trying to remove all the non-alphanumeric characters from a String in Java but keep the carriage returns.我正在尝试从 Java 中的字符串中删除所有非字母数字字符,但保留回车符。 I have the following regular expression, but it keeps joining words before and after a line break.
我有以下正则表达式,但它在换行前后一直连接单词。
[^\\p{Alnum}\\s]
How would I be able to preserve the line breaks or convert them into spaces so that I don't have words joining?我如何能够保留换行符或将它们转换为空格,以便我没有文字加入?
An example of this issue is shown below:此问题的示例如下所示:
Original Text原文
and refreshingly direct
when compared with the hand-waving of Swinburne.
After Replacement:更换后:
and refreshingly directwhen compared with the hand-waving of Swinburne.
You may add these chars to the regex, not \\s
, as \\s
matches any whitespace:您可以将这些字符添加到正则表达式,而不是
\\s
,因为\\s
匹配任何空格:
String reg = "[^\\p{Alnum}\n\r]";
Or, you may use character class subtraction :或者,您可以使用字符类减法:
String reg = "[\\P{Alnum}&&[^\n\r]]";
Here, \\P{Alnum}
matches any non-alphanumeric and &&[^\\n\\r]
prevents a LF and CR from matching.这里,
\\P{Alnum}
匹配任何非字母数字,而&&[^\\n\\r]
阻止 LF 和 CR 匹配。
String s = "&&& Text\r\nNew line".replaceAll("[^\\p{Alnum}\n\r]+", "");
System.out.println(s);
// => Text
Newline
Note that there are more line break chars than LF and CR.请注意,换行符比 LF 和 CR 多。 In Java 8,
\\R
construct matches any style linebreak and it matches \ \ |\\[\ \\ \ \
\
\
\\]
.在 Java 8 中,
\\R
构造匹配任何样式的换行符并且它匹配\ \ |\\[\ \\ \ \
\
\
\\]
。
So, to exclude matching any line breaks, you may use因此,要排除匹配任何换行符,您可以使用
String reg = "[^\\p{Alnum}\\u000A\\u000B\\u000C\\u000D\\u0085\\u2028\\u2029]+";
You can use this regex [^A-Za-z0-9\\\\n\\\\r]
for example :您可以使用此正则表达式
[^A-Za-z0-9\\\\n\\\\r]
例如:
String result = str.replaceAll("[^a-zA-Z0-9\\n\\r]", "");
Example示例
Input输入
aaze03.aze1654aze987 */-a*azeaze\n hello *-*/zeaze+64\nqsdoi
Output输出
aaze03aze1654aze987aazeaze
hellozeaze64
qsdoi
I made a mistake with my code.我的代码出错了。 I was reading in a file line by line and building the String, but didn't add a space at the end of each line.
我正在逐行读取文件并构建字符串,但没有在每行末尾添加空格。 Therefore there were no actual line breaks to replace.
因此,没有实际的换行符可以替换。
That's a perfect case for Guava's CharMatcher :这是 Guava 的CharMatcher的完美案例:
String input = "and refreshingly direct\n\rwhen compared with the hand-waving of Swinburne.";
String output = CharMatcher.javaLetterOrDigit().or(CharMatcher.whitespace()).retainFrom(input);
Output will be:输出将是:
and refreshingly direct
when compared with the handwaving of Swinburne
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.