简体   繁体   中英

Remove special characters java

Hi I'm trying to figure out a way to remove the tags from the results returned from the Google Feed API. Their result is

   Breaking \u003cb\u003eNews\u003c/b\u003e Updates

How can we remove these characters? I'm not sure if RegEx would be better (or worse). Does anyone have an idea on how to remove these? Google does not supply an option to remove tags from the results in Java.

我经常拉那些

String.replaceAll("\\p{Cntrl}","")

The best solution would be to use JSON to convert the data.

JSON.parse(JSON.stringify({a : '<put your string here>'}));

It will be proper as the data you will get from Google API will be in the form of JSON.

You can use the below regex..

String str = "Breaking \u003cb\u003eNews\u003c/b\u003e Updates";
str = str.replaceAll("\\<(.*)?\\>(.*)\\</\\1\\>", "$2");
System.out.println(str);

OUTPUT : -

Breaking News Updates
  • \\\\<(.*)?\\\\> matches the first opening tag - <b>
  • \\\\</\\\\1\\\\> matches the corresponding closing tag - </b>
  • \\\\1 is used to backreference what was the tag, so that correct pair of tags are matched..

So, <b>news <update></b> -> In this case <update> will not be removed..

This is HTML. \ translates to <b> .

You'll want to use an HTML parser as HTML is not fully parse-able by a regular expression.

With a library like Jsoup you could do this as.

String data = Jsoup.parse(html).body().text();

This will get you "Breaking News Updates" .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM