简体   繁体   中英

Java Regex to Match XML tags

I am trying a figure out a regex pattern to match XML tags.

I have two kinds of XML tags. First kind

<myTag value="One" value="Two">SomeContentHere</myTag>

This tag I could match with the following regex pattern.

<myTag[\s\S]*?>[\s\S]*?<\/myTag>

Second kind is, I have the same tag that appear as <myTag value="One" value="Two"/> . I struggle on finding a regex to match these kinds of XML tags. I need to match the entire XML like in the above matching in the first kind. My objective is to find a regex pattern that can capture both the above scenarios.

I tried something like <myTag[\s\S]*?>[\s\S]*?[<\/myTag>]? but, in this case, this pattern fails to capture my first XML tag type

Kindly help me.

There are tons of answers here in this community on why its bad to use regex for this. Having said that here is the approach for this problem. Convert your string to a Document if it is possible. It is possible if String is a valid xml. Then look for the desired tag in the Document. Code is:

private boolean containsTag(String xml, String tagName)
    {
        Document doc = getDocument(xml);
        if ( doc != null )
        {
           NodeList list = doc.getElementsByTagName(tagName);
           return list != null && list.getLength() > 0;
        }
        return false;
    }


    private static Document getDocument(String xml) 
    {

         try
         {
             DocumentBuilder docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
             Document doc = docBuilder.parse(new InputSource(new StringReader(xml)));
             return doc;
         } 
         catch (Exception e) 
         {
             e.printStackTrace();
         }
         return null;
   }

• For your first type of tag use: (<myTag)([\s\S]*?)(>)([\s\S]*?)(<\/myTag>)

• For your second type of tag use: (<myTag)([\s\S]*?)(\/>)

• For both type at the same time use: (<myTag)([\s\S]*?)(\/>)|(<myTag)([\s\S]*?)(>)([\s\S]*?)(<\/myTag>)

FirstTypeExample

SecondTypeExample

BothTypeAtTheSameTimeExample

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM