I have some content and I would like to know whether they are XML
or not. How to do that ? I would only need to know the answer true
or false
from a method return type. I plan to use REgex but open for better suggestions.
The XML
content is as following and will be always in the same format (may be the molecule ID will be increased or decreased),
<?xml version="1.0" encoding="UTF-8"?>
<molecules>
<molecule id="1">
<atoms>
<atom id="1" symbol="C"/>
<atom id="2" symbol="C"/>
<atom id="3" symbol="N"/>
</atoms>
<bonds>
<bond id="1" atomAId="1" atomBId="2" order="SINGLE"/>
<bond id="2" atomAId="2" atomBId="3" order="DOUBLE"/>
</bonds>
</molecule>
<molecule id="2">
<atoms>
<atom id="1" symbol="C"/>
<atom id="2" symbol="C"/>
<atom id="3" symbol="N"/>
</atoms>
<bonds>
<bond id="1" atomAId="1" atomBId="2" order="SINGLE"/>
<bond id="2" atomAId="2" atomBId="3" order="DOUBLE"/>
</bonds>
</molecule>
</molecules>
I make the Regex
to recognize the XML
as following,
public static final String REGEX_FOR_XML = "((<(\\S(.*?))(\\s.*?)?>(.*?)<\\/\\3>)|(<\\S(.*?)(.*?)(\\/>)))";
The issue is it only matches with the inner content while I would like to make an entire content match. I use this validator for matching,
public static boolean isValidXML(String inXMLStr) {
if (inXMLStr == null || inXMLStr.isEmpty())
return false;
final Pattern pattern = Pattern.compile(Constants.REGEX_FOR_XML);
if (pattern.matcher(inXMLStr).matches()) {
return true;
}
return false;
}
How can I correct the Regex
to match with the XML
content or what to do as better option ?
There is an infamous answer on using Regex for XML-Parsing, which I will not link (@Henrik did anyway ;P) or go into. But bottomline: Regex is very rarely a good idea to do XML validation (or parsing for that matter).
I suggest you go here: XML validation Oracle Docs
I guess it should be what you want. See, in Java you can use Schema-Validation to validate XML - which is what you want to do if I read the question correctly.
What you will have to do is to write a schema definition instead of a regex. This is not only the "correct and straight-forward" way to go, it will be much easier to maintain, too. It is no rocket science, neither and your schema seems to be pretty clear and rather easy to be condensed into an xsd. There are also tools which can help you do that. The outcome of those might still have to be fine-tuned, though.
Note: I know that "link-only" answers are discouraged on SO, but the resource is too big to be copied to the answer (at least IMHO). Also, there might be some copyright on behalf of Oracle. Since it is official Oracle Docs it should not be prone to "broken link" probably, too.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.