I have a xml file which I am parsing. Though some of the tag names happened to occur multiple times, under different parent name. I know which parent's child I want to ignore. How can I do that?
<sub-article id="S01" article-type="translation" xml:lang="pt">
<front-stub>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Artigos Originais</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>
Prevalência de deficiência nutricional em pacientes com
tuberculose pulmonar
<xref ref-type="fn" rid="fn02">*</xref>
</article-title>
</title-group>
</front-stub>
</article-categories>
</sub-article>
.....
.....
<article-meta>
<article-id pub-id-type="pmid">24068270</article-id>
<article-id pub-id-type="pmc">4075858</article-id>
<article-id pub-id-type="publisher-id">S1806-37132013000400012</article-id>
<article-id pub-id-type="doi">10.1590/S1806-37132013000400012</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Original Articles</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>
Prevalence of nutritional deficiency in patients with
pulmonary tuberculosis
<xref ref-type="fn" rid="fn01">*</xref>
</article-title>
</title-group>
<article-meta>
In this example, I dont want to process the children under sub-article tag. So, "article-title" would be processed only for "Prevalence of nutritional deficiency in patients with pulmonary tuberculosis", not "Prevalência de deficiência nutricional em pacientes com tuberculose pulmonar"
I am currently following code, which returns me all the nodes having name "title-group. How can I make it specific so I dont get it from certain parent.
NodeList titleNodeList = document.getElementsByTagName("title-group");
Just search for "title-group" nodes under "sub-article" nodes:
List<Node> allTitleGroupNodes = new ArrayList<>();
NodeList subArticleNodes = document.getElementsByTagName("sub-article");
for (int i = 0; i < subArticleNodes.getLength(); i++) {
NodeList titleNodes = subArticleNodes.item(i).getElementsByTagName("title-group");
for (int j = 0; j < titleNodes.getLength(); j++) {
allTitleGroupNodes.add(titleNodes.item(j));
}
}
(Aside: The horrible interface of NodeList
is one of the things I hate most about processing XML in standard Java.)
There're two ways to achieve it using XPath:
<article-meta>
<sub-article>
Personally I prefer the 1st one since it's more explicit and always works faced to different XML files.
Use XPath to select elements only of they're under <article-meta>
:
//article-meta//title-group
Java:
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xPath.compile("//article-meta//title-group");
NodeList titleNodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
Use XPath to exclude elements if they're under <sub-article>
. I assume that the XML root element is <article>
(please justify the code if it's not the case):
/article/*[not(self::sub-article)]//title-group
Java
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xPath.compile("/article/*[not(self::sub-article)]//title-group");
NodeList titleNodes = (NodeList) expr.evaluate(document, XPathConstants.NODESET);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.