I have a string name s,
String s = "<NOUN>Sam</NOUN> , a student of the University of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue Olympiad Hotel";
I want to remove all < NOUN > and < /NOUN > tags from the string. I used this to remove tags,
s.replaceAll("[<NOUN>,</NOUN>]","");
Yes it removes the tag. but it also removes letter 'U' and 'O' characters from the string which gives me following output.
Sam , a student of the niversity of oxford , won the Ethugalpura International Rating Chess Tournament which concluded on Dec.22 at the Blue lympiad Hotel
Can anyone please tell me how to do this correctly?
Try:
s.replaceAll("<NOUN>|</NOUN>", "");
In RegEx, the syntax [...]
will match every character inside the brackets, regardless of the order they appear in. Therefore, in your example, all appearances of "<", "N", "O" etc. are removed. Instead use the pipe ( |
) to match both "<NOUN>" and "</NOUN>".
The following should also work (and could be considered more DRY and elegant) since it will match the tag both with and without the forward slash:
s.replaceAll("</?NOUN>", "");
String.replaceAll() takes a regular expression as its first argument. The regexp:
"[<NOUN>,</NOUN>]"
defines within the brackets the set of characters to be identified and thus removed. Thus you're asking to remove the characters <
, >
, /
, N
, O
, U
and comma.
Perhaps the simplest method to do what you want is to do:
s.replaceAll("<NOUN>","").replaceAll("</NOUN>","");
which is explicit in what it's removing. More complex regular expressions are obviously possible.
You can use one regular expression for this: "<[/]*NOUN>" so
s.replaceAll("<[/]*NOUN>","");
should do the trick. The "[/]*" matches zero or more "/" after the "<".
试试这个: String result = originValue.replaceAll("\\\\<.*?>", "");
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.