简体   繁体   中英

replace any non-ascii character in a string in java

How would one convert -lrb-300-rrb- 922-6590 to -lrb-300-rrb- 922-6590 in java?

Have tried the following:

t.lemma = lemma.replaceAll("\\p{C}", " ");
t.lemma = lemma.replaceAll("[\u0000-\u001f]", " ");

Am probably missing something conceptual. Will appreciate any pointers to the solution.

Thank you

Try the next:

str = str.replaceAll("[^\\\\p{ASCII}]", " ");

By the way, \\p{ASCII} is all ASCII: [\\x00-\\x7F] .

In ahother hand, you need to use a constant of Pattern for avoid recompiled the expression every time.

private static final Pattern REGEX_PATTERN = 
        Pattern.compile("[^\\p{ASCII}]");

public static void main(String[] args) {
    String input = "-lrb-300-rrb- 922-6590";
    System.out.println(
        REGEX_PATTERN.matcher(input).replaceAll(" ")
    );  // prints "-lrb-300-rrb- 922-6590"
}

See also:

假设你只想保留a-zA-Z0-9和标点字符,你可以这样做:

t.lemma = lemma.replaceAll("[^\\p{Punct}\\w]", " "));

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM