简体   繁体   中英

Regex to detect html tags and replace just some

Having a XML like this:

<Abc>
    <Bcd>C</Bcd>
    <Def>E</Def>
    <Fgh>G</Fgh>
</Abc>

Which would be the correct Java RegEx to change it to the following?

-Abc-
    -Bcd-Cde-/Bcd-
    <Def>E</Def>
    <Fgh>G</Fgh>
-/Abc-

I made it to:

myString.replaceAll("\\<(.*?)\\>","-$1-");

but I'm not able to do the proper exclusions, so that only some tags are changed.

Update: Using the negation of Pshemo's answer did the trick:

myString.replaceAll("\\<?!(KWD|COM|PLN|PUN|TYP|STR)\\>","&lt;$1&gt;")

Try

"\\<[AB]\\>"

for your regular expression. It will find < then A or B then >. For more information see http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

To extend it even further for full sets of characters you can use the following format.

public class RegexString
{
    private static String myString = "<Abc>\n" 
            + "    <Bcd>C</Bcd>\n"
            + "    <Def>E</Def>\n" 
            + "    <Fgh>G</Fgh>\n" 
            + "</Abc>";
    private static final String HTML_TAGS = "/?(Abc|Bcd)";
    private static final String FULL_REGEX = "<" + HTML_TAGS + ">";

    public static void main(String[] args)
    {
          System.out.println(myString.replaceAll(FULL_REGEX, "-$1-"));
    }
}

Using regex on HTML code is not good idea as pointed in this question but if you really want to complicate your life then you can use OR | operator like

String yourData = "<Abc>\n" 
                + "    <Bcd>C</Bcd>\n"
                + "    <Def>E</Def>\n" 
                + "    <Fgh>G</Fgh>\n" 
                + "</Abc>";
System.out.println(yourData.replaceAll("</?(Abc|Bcd)>", "-$1-"));

output:

-Abc-
    -Bcd-C-Bcd-
    <Def>E</Def>
    <Fgh>G</Fgh>
-Abc-

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM