简体   繁体   中英

Java Regex - Remove Non-Alphanumeric characters except line breaks

I'm trying to remove all the non-alphanumeric characters from a String in Java but keep the carriage returns. I have the following regular expression, but it keeps joining words before and after a line break.

[^\\p{Alnum}\\s]

How would I be able to preserve the line breaks or convert them into spaces so that I don't have words joining?

An example of this issue is shown below:

Original Text

and refreshingly direct
when compared with the hand-waving of Swinburne.

After Replacement:

 and refreshingly directwhen compared with the hand-waving of Swinburne.

You may add these chars to the regex, not \\s , as \\s matches any whitespace:

String reg = "[^\\p{Alnum}\n\r]";

Or, you may use character class subtraction :

String reg = "[\\P{Alnum}&&[^\n\r]]";

Here, \\P{Alnum} matches any non-alphanumeric and &&[^\\n\\r] prevents a LF and CR from matching.

A Java test :

String s = "&&& Text\r\nNew line".replaceAll("[^\\p{Alnum}\n\r]+", "");
System.out.println(s); 
// => Text
Newline

Note that there are more line break chars than LF and CR. In Java 8, \\R construct matches any style linebreak and it matches \ \ |\\[\ \ \ \ \…\
\
\\] .

So, to exclude matching any line breaks, you may use

String reg = "[^\\p{Alnum}\\u000A\\u000B\\u000C\\u000D\\u0085\\u2028\\u2029]+";

You can use this regex [^A-Za-z0-9\\\\n\\\\r] for example :

String result = str.replaceAll("[^a-zA-Z0-9\\n\\r]", "");

Example

Input

aaze03.aze1654aze987  */-a*azeaze\n hello *-*/zeaze+64\nqsdoi

Output

aaze03aze1654aze987aazeaze
hellozeaze64
qsdoi

I made a mistake with my code. I was reading in a file line by line and building the String, but didn't add a space at the end of each line. Therefore there were no actual line breaks to replace.

That's a perfect case for Guava's CharMatcher :

String input = "and refreshingly direct\n\rwhen compared with the hand-waving of Swinburne.";
String output = CharMatcher.javaLetterOrDigit().or(CharMatcher.whitespace()).retainFrom(input);

Output will be:

and refreshingly direct
when compared with the handwaving of Swinburne

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM