简体   繁体   中英

To remove Unicode character from String in Java using REGEX

I am having Input String like below.

String comment = "Good morning! \u2028\u2028I am looking to purchase a new Honda car as I\u2019m outgrowing my current car. I currently drive a Hyundai Accent and I was looking for something a
 little bit larger and more comfortable like the Honda Civic. May I know if you have any of the models currently in stock? Thank you! Warm regards Sandra";

I want to remove Unicode characters like "\
" , "\’" etc if it is present in the comment section.In runtime i don't know what are all extra characters coming. So what is the best way to handle this?

I tried like below which removes unicode characters in the given string.

Comments.replaceAll("\\P{Print}", "");

So what is the best way to match Unicode characters are present in the comment section and if present remove those, otherwise just pass the comment to target system.

Can anyone please help me to resolve this?

You can do this sequentially like below:

public static void main(final String args[]) {
    String comment = "Good morning! \u2028\u2028I am looking to purchase a new Honda car as I\u2019m outgrowing my current car. I currently drive a Hyundai Accent and I was looking for something a little bit larger and more comfortable like the Honda Civic. May I know if you have any of the models currently in stock? Thank you! Warm regards Sandra";

    // remove all non-ASCII characters
    comment = comment.replaceAll("[^\\x00-\\x7F]", "");

    // remove all the ASCII control characters
    comment = comment.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "");

    // removes non-printable characters from Unicode
    comment = comment.replaceAll("\\p{C}", "");
    System.out.println(comment);
  }

If you use replace , you will lost some characters, For example I'm will become Im . So the best thing is convert.

You can Convert Unicode to UTF-8.

byte[] byteComment = comment.getBytes("UTF-8");

String formattedComment = new String(byteComment, "UTF-8");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM