简体   繁体   English

Java REGEX-无法删除标签内的内容

[英]Java REGEX - Not able to remove content inside tag

This is my input text: 这是我的输入文字:

[QUOTE=SynapseBreak;104047835]Armchio de dragon is satki dragon lai de leh

[URL="https://play.google.com/store/apps/details?id=com.shiportal.hwzreader&referrer=utm_source%3Dsignature%26utm_medium%3Dforum"]Sent from 權志-龍 using GAGT[/URL][/QUOTE]
why satki ? tell me :s13:

[QUOTE=articated;104047854]I not sad lah
U happy i happy kym

Just for fun loh :s12:
[ms]自從我變成了狗屎,就再也沒有人敢踩在我頭上了 HardwareZone Forums app[/ms][/QUOTE]
today arti jin sweet make me happy :s12:

[QUOTE=Iandao;104047967]Gg mbs now...[/QUOTE]
go there jiak simi ??

I am trying to remove all the content inside [QUOTE] [/QUOTE] tags and the tags themselves. 我正在尝试删除[QUOTE] [/ QUOTE]标签和标签本身中的所有内容。

I want the output to be : 我希望输出为:

why satki ? tell me :s13: today arti jin sweet make me happy :s12: go there jiak simi ??  

The code i tried is: 我试过的代码是:

string.replaceAll("\\[QUOTE.*\\[/QUOTE\\]", "")

Note that you may use the following fix for your pattern only if the input does not contain nested [QUOTE] tages . 请注意, 仅当输入不包含嵌套的[QUOTE]标记时,才可以对模式使用以下修复程序。

A . . in your regex does not match line breaks, and .* is too greedy, ie will match up to the last occurrence of [/QUOTE] on a line/in a string. 正则表达式中的字符与换行符不匹配,并且.*过于贪婪,即与行中/字符串中最后一次出现的[/QUOTE]匹配。

Use lazy dot matching with the Pattern.DOTALL inline modifier (embedded flag option) (?s) that will force the . 将懒点匹配与Pattern.DOTALL内联修饰符(嵌入式标志选项) (?s) ,它将强制使用. to match any char: 匹配任何字符:

"(?s)\\[QUOTE=.*?\\[/QUOTE\\]" 
 ^^^^         ^^^

See this regex demo . 请参阅此正则表达式演示

Or, unroll the lazy dot (to make the pattern find matches faster) as: 或者,展开惰性点(以使模式查找更快地匹配)为:

"\\[QUOTE=[^\\[]*(?:\\[(?!/QUOTE\\])[^\\[]*)*\\[/QUOTE\\]"

See this regex demo . 请参阅此正则表达式演示

Java demo : Java演示

String pat = "\\[QUOTE=[^\\[]*(?:\\[(?!/QUOTE])[^\\[]*)*\\[/QUOTE]";
String str = "[QUOTE=SynapseBreak;104047835]Armchio de dragon is satki dragon lai de leh\n\n[URL=\"https://play.google.com/store/apps/details?id=com.shiportal.hwzreader&referrer=utm_source%3Dsignature%26utm_medium%3Dforum\"]Sent from 權志-龍 using GAGT[/URL][/QUOTE]\nwhy satki ? tell me :s13:\n[QUOTE=articated;104047854]I not sad lah\nU happy i happy kym\n\nJust for fun loh :s12:\n[ms]自從我變成了狗屎,就再也沒有人敢踩在我頭上了 HardwareZone Forums app[/ms][/QUOTE]\ntoday arti jin sweet make me happy :s12:\n\n[QUOTE=Iandao;104047967]Gg mbs now...[/QUOTE]\ngo there jiak simi ??'";
String res = str.replaceAll(pat, "");
System.out.println(res); 
// => why satki ? tell me :s13:
//
//    today arti jin sweet make me happy :s12:
//
//
//     go there jiak simi ??'

Your regex is not taking new lines into account. 您的正则表达式未考虑换行。 This is done by adding (?s) at the beginning. 这是通过在开头添加(?s)来完成的。

string.replaceAll("(?s)\\[QUOTE.*?\\[/QUOTE\\]", "");
(?s)\\[QUOTE.*?\\[/QUOTE\\]

Try the above RegEx. 试试上面的RegEx。 It will work. 它会工作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM