简体   繁体   中英

How to write proper regex to recognize the XML content?

I have some content and I would like to know whether they are XML or not. How to do that ? I would only need to know the answer true or false from a method return type. I plan to use REgex but open for better suggestions.

The XML content is as following and will be always in the same format (may be the molecule ID will be increased or decreased),

<?xml version="1.0" encoding="UTF-8"?>
<molecules>
    <molecule id="1">
        <atoms>
            <atom id="1" symbol="C"/>
            <atom id="2" symbol="C"/>
            <atom id="3" symbol="N"/>
        </atoms>
        <bonds>
            <bond id="1" atomAId="1" atomBId="2" order="SINGLE"/>
            <bond id="2" atomAId="2" atomBId="3" order="DOUBLE"/>
        </bonds>
    </molecule>
     <molecule id="2">
        <atoms>
            <atom id="1" symbol="C"/>
            <atom id="2" symbol="C"/>
            <atom id="3" symbol="N"/>
        </atoms>
        <bonds>
            <bond id="1" atomAId="1" atomBId="2" order="SINGLE"/>
            <bond id="2" atomAId="2" atomBId="3" order="DOUBLE"/>
        </bonds>
    </molecule>
</molecules> 

I make the Regex to recognize the XML as following,

public static final String REGEX_FOR_XML = "((<(\\S(.*?))(\\s.*?)?>(.*?)<\\/\\3>)|(<\\S(.*?)(.*?)(\\/>)))";

The issue is it only matches with the inner content while I would like to make an entire content match. I use this validator for matching,

public static boolean isValidXML(String inXMLStr) {

    if (inXMLStr == null || inXMLStr.isEmpty())
        return false;

    final Pattern pattern = Pattern.compile(Constants.REGEX_FOR_XML);
    if (pattern.matcher(inXMLStr).matches()) {
        return true;
    }
    return false;
}

How can I correct the Regex to match with the XML content or what to do as better option ?

There is an infamous answer on using Regex for XML-Parsing, which I will not link (@Henrik did anyway ;P) or go into. But bottomline: Regex is very rarely a good idea to do XML validation (or parsing for that matter).

I suggest you go here: XML validation Oracle Docs

I guess it should be what you want. See, in Java you can use Schema-Validation to validate XML - which is what you want to do if I read the question correctly.

What you will have to do is to write a schema definition instead of a regex. This is not only the "correct and straight-forward" way to go, it will be much easier to maintain, too. It is no rocket science, neither and your schema seems to be pretty clear and rather easy to be condensed into an xsd. There are also tools which can help you do that. The outcome of those might still have to be fine-tuned, though.

Note: I know that "link-only" answers are discouraged on SO, but the resource is too big to be copied to the answer (at least IMHO). Also, there might be some copyright on behalf of Oracle. Since it is official Oracle Docs it should not be prone to "broken link" probably, too.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM