简体   繁体   English

使用DOM读取XML文件

[英]Reading XML files with DOM

I have an XML file like this: 我有一个像这样的XML文件:

<?xml version="1.0" encoding="UTF-8"?>
<collection>
  <source />
  <date />
  <key />
  <document>
    <id>AIMed_d30</id>
    <passage>
      <offset>0</offset>
      <text>Isolation of human delta-catenin and its binding specificity with presenilin 1. We screened proteins for interaction with presenilin (PS) 1, and cloned the full-length cDNA of human delta-catenin, which encoded 1225 amino acids. Yeast two-hybrid assay, GST binding assay and immunoprecipitation demonstrated that delta-catenin interacted with a hydrophilic loop region in the endoproteolytic C-terminal fragment of PS1, but not with that of PS-2. These results suggest that PS1 and PS2 partly differ in function. PS1 loop fragment containing the pathogenic mutation retained the binding ability. We also found another armadillo-protein, p0071, interacted with PS1.</text>
      <annotation id="T1">
        <infon key="file">ann</infon>
        <infon key="type">protein</infon>
        <location length="13" offset="19" />
        <text>delta-catenin</text>
      </annotation>
      <annotation id="T3">
        <infon key="file">ann</infon>
        <infon key="type">protein</infon>
        <location length="17" offset="122" />
        <text>presenilin (PS) 1</text>
      </annotation>
      <annotation id="T2">
        <infon key="file">ann</infon>
        <infon key="type">protein</infon>
        <location length="12" offset="66" />
        <text>presenilin 1</text>
      </annotation>
      <relation id="R4">
        <infon key="relation type">Interaction</infon>
        <infon key="file">ann</infon>
        <infon key="type">Relation</infon>
        <node role="Arg1" refid="T12" />
        <node role="Arg2" refid="T13" />
      </relation>
      <relation id="R2">
        <infon key="relation type">Interaction</infon>
        <infon key="file">ann</infon>
        <infon key="type">Relation</infon>
        <node role="Arg1" refid="T3" />
        <node role="Arg2" refid="T4" />
      </relation>
      <relation id="R3">
        <infon key="relation type">Interaction</infon>
        <infon key="file">ann</infon>
        <infon key="type">Relation</infon>
        <node role="Arg1" refid="T5" />
        <node role="Arg2" refid="T6" />
      </relation>
      -
      <relation id="R1">
        <infon key="relation type">Interaction</infon>
        <infon key="file">ann</infon>
        <infon key="type">Relation</infon>
        <node role="Arg1" refid="T1" />
        <node role="Arg2" refid="T2" />
      </relation>
    </passage>
  </document>
</collection>

but when I use DOM for reading this XML file, I have some problems. 但是,当我使用DOM读取此XML文件时,出现了一些问题。 For example, for the annotation tag, it has 8 item tags in it, but when I print the result, it becomes 10 or more. 例如,对于annotation标签,它具有8个项目标签,但是当我打印结果时,它变为10个或更多。 And for the relation tag, it doesn't work right. 对于relation标签,它不起作用。 This is my Java code: 这是我的Java代码:

public class XMLRead {
    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException{
        try{
            File fXmlFile = new File("D:/THESIS/DataSet/Newfolder/Newfolder/aimed_bioc2.xml");
            DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
            DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
            Document doc = dBuilder.parse(fXmlFile);

            doc.getDocumentElement().normalize();

            System.out.println("Root element :" + doc.getDocumentElement().getNodeName());

            NodeList nList = doc.getElementsByTagName("document");

            System.out.println("OK----------------------------");

            for (int temp = 0; temp < nList.getLength(); temp++) {

                file1_Node nNode = nList.item(temp);
                file1_
                    file1_System.out.println("\nCurrent Element :" + nNode.getNodeName());
                file1_
                    file1_if (nNode.getNodeType() == Node.ELEMENT_NODE) {

                        Element eElement = (Element) nNode;

                        System.out.println("id : " + eElement.getElementsByTagName("id").item(0).getTextContent());

                        //                    NodeList nList2 = doc.getElementsByTagName("passage");
                        //                    for(int i=0; i< nList2.getLength(); i++)
                        //                    {
                        System.out.println("\toffset : " +  eElement.getElementsByTagName("offset").item(0).getTextContent());
                        System.out.println("\ttext: " + eElement.getElementsByTagName("text").item(0).getTextContent());                        
                        System.out.println("----------------------------");

                        NodeList nList3 = doc.getElementsByTagName("annotation");                       
                        for (int temp2 = 0; temp2 < nList3.getLength(); temp2++) {   
                            Node nNode2 = nList3.item(temp2);tln("\n\n");
                            if(nNode2.getNodeType() == Node.ELEMENT_NODE)  
                            {
                                Element eElement2 = (Element) nNode2;
                                System.out.println("\tannotation id : " + eElement2.getAttribute("id"));

                                NodeList nList4=doc.getElementsByTagName("infon");

                                Node nNode3=nList4.item(0);
                                Node nNode4=nList4.item(1);

                                Element eElement3= (Element) nNode3;                                             
                                Element eElement4= (Element) nNode4;
                                System.out.println("\t\tinfon key : " + eElement3.getAttribute("key")
                                        +",   infon : " +eElement.getElementsByTagName("infon").item(0).getTextContent());
                                System.out.println("\t\tinfon key : " + eElement4.getAttribute("key")
                                        + ",   infon : " +eElement.getElementsByTagName("infon").item(1).getTextContent());        

                                NodeList nList5 = doc.getElementsByTagName("location");                                                           
                                Node nNode5=nList5.item(temp2);

                                Element eElement5=(Element) nNode5;
                                System.out.println("\t\tLocation Lenght : " +eElement5.getAttribute("length")
                                        +"   ,Location offset : " + eElement5.getAttribute("offset"));

                                System.out.println("\t\tannotation text : "+ eElement2.getElementsByTagName("text").item(0).getTextContent());
                            }
                        }

                        System.out.println("----------------------------");

                        NodeList nList6 = doc.getElementsByTagName("relation");                       
                        for (int temp3 = 0; temp3 < nList6.getLength(); temp3++) {   
                            Node nNode6 = nList6.item(temp3);tln("\n\n");
                            if(nNode6.getNodeType() == Node.ELEMENT_NODE)  
                            {
                                Element eElement6 = (Element) nNode6;
                                System.out.println("\tRelation id : " + eElement6.getAttribute("id"));
                                Node nNode14=nList6.item(0);
                                Element eElement14=(Element) nNode14;
                                NodeList nList7=doc.getElementsByTagName("infon");
                                for(int temp5 = 0; temp5<nList7.getLength(); temp5++){
                                    Node nNode7=nList7.item(temp5);
                                    Node nNode8=nList7.item(1);
                                    Node nNode9=nList7.item(2);
                                    Element eElement7= (Element) nNode7;                                             
                                    Element eElement8= (Element) nNode8;
                                    Element eElement9= (Element) nNode9;
                                    System.out.println("\t\tinfon key : " + eElement7.getAttribute("key")
                                            +"    ,infon : " +eElement6.getElementsByTagName("infon").item(0).getTextContent());}
                                    System.out.println("\n\n");

                                    NodeList nList8 = doc.getElementsByTagName("node"); 
                                    for(int temp4=0; temp4<nList8.getLength(); temp4++)
                                    {
                                        Node nNode12 = nList8.item(temp4);
                                        Element eElement12 = (Element) nNode12;

                                        System.out.println("\t\tNode Role : " +eElement12.getAttribute("role")
                                                +"   ,refid : " + eElement12.getAttribute("refid"));
                                    }

                            }
                        }
                    }
                    // }
            }
        }
        catch (IOException e) {
            e.printStackTrace();
        }
    }

}

When your program handle child elements, doc.getElementsByTagName(tagname) will return match nodes of whole document. 当程序处理子元素时, doc.getElementsByTagName(tagname)将返回整个文档的匹配节点。 If this is not want you want, you shall review the code and fix it. 如果这不是您想要的,则应检查代码并进行修复。 For example: 例如:

NodeList nList6 = doc.getElementsByTagName("relation");                       
for (int temp3 = 0; temp3 < nList6.getLength(); temp3++) {   
    Node nNode6 = nList6.item(temp3);           
    System.out.println("\n\n");
    if(nNode6.getNodeType() == Node.ELEMENT_NODE)  
    {
        Element eElement6 = (Element) nNode6;
        System.out.println("\tRelation id : " + eElement6.getAttribute("id"));
        Node nNode14=nList6.item(0);
        Element eElement14=(Element) nNode14;
        //NodeList nList7=doc.getElementsByTagName("infon");
        //Correct call for getting all descendant elements of *eElement6*
        NodeList nList7=eElement6.getElementsByTagName("infon");
         //...
     }
     }
 }

This kind of usage is a good demonstration of how org.w3c.dom.* code usage can quickly become difficult to read. 这种用法很好地说明了org.w3c.dom。*代码用法如何很快变得难以阅读。 Parsing to a typed class structure is one way of avoiding it. 解析为类型化的类结构是避免它的一种方法。

Or, if you can use Java 8 my library Dynamics may provide a more readable way, along with null-safety & more descriptive error reporting. 或者,如果您可以使用Java 8,则我的库Dynamics可能会提供更易读的方式,以及null安全性和更具描述性的错误报告。

File fXmlFile = new File("D:/THESIS/DataSet/Newfolder/Newfolder/aimed_bioc2.xml");
XmlDynamic xml = new XmlDynamic(new FileReader(fXmlFile));

xml.get("collection").children()
    .filter(hasElementName("document"))
    .forEach(document -> {
        System.out.println("id : " + document.get("id").asString());
        System.out.println("\toffset : " +  document.get("passage|offset").asString());
        System.out.println("\ttext: " + document.get("passage|text").asString());
        System.out.println("----------------------------");

        Dynamic passage = document.get("passage");
        passage.children()
            .filter(hasElementName("annotation"))
            .forEach(annotation -> {
                System.out.println("\tannotation id : " + annotation.get("id").asString());

                annotation.children()
                    .filter(hasElementName("infon"))
                    .forEach(infon -> {
                        System.out.printf("\t\tinfon key : %s,   infon : %s%n",
                            infon.get("key").asString(), infon.asString());
                    });

                System.out.printf("\t\tlocation Length : %s,   location offset : %s%n",
                    annotation.get("location|length").asString(), annotation.get("location|offset").asString());

                System.out.println("\t\tannotation text : "+ annotation.get("text").asString());
            });

        System.out.println("----------------------------");

        passage.children()
            .filter(hasElementName("relation"))
            .forEach(relation -> {
                System.out.println("\trelation id : " + relation.get("id").asString());

                relation.children()
                    .filter(hasElementName("infon"))
                    .forEach(infon -> {
                        System.out.printf("\t\tinfon key : %s,   infon : %s%n",
                            infon.get("key").asString(), infon.asString());
                    });

                relation.children()
                    .filter(hasElementName("node"))
                    .forEach(node -> {
                        System.out.printf("\t\tnode role : %s,   refid : %s%n",
                            node.get("role").asString(), node.get("refid").asString());
                    });
            });
    });

Output: 输出:

id : AIMed_d30
    offset : 0
    text: Isolation of human delta-catenin and its binding specificity with presenilin 1. We screened proteins for interaction with presenilin (PS) 1, and cloned the full-length cDNA of human delta-catenin, which encoded 1225 amino acids. Yeast two-hybrid assay, GST binding assay and immunoprecipitation demonstrated that delta-catenin interacted with a hydrophilic loop region in the endoproteolytic C-terminal fragment of PS1, but not with that of PS-2. These results suggest that PS1 and PS2 partly differ in function. PS1 loop fragment containing the pathogenic mutation retained the binding ability. We also found another armadillo-protein, p0071, interacted with PS1.
----------------------------
    annotation id : T1
        infon key : file,   infon : ann
        infon key : type,   infon : protein
        location Length : 13,   location offset : 19
        annotation text : delta-catenin
    annotation id : T3
        infon key : file,   infon : ann
        infon key : type,   infon : protein
        location Length : 17,   location offset : 122
        annotation text : presenilin (PS) 1
    annotation id : T2
        infon key : file,   infon : ann
        infon key : type,   infon : protein
        location Length : 12,   location offset : 66
        annotation text : presenilin 1
----------------------------
    relation id : R4
        infon key : relation type,   infon : Interaction
        infon key : file,   infon : ann
        infon key : type,   infon : Relation
        node role : Arg1,   refid : T12
        node role : Arg2,   refid : T13
    relation id : R2
        infon key : relation type,   infon : Interaction
        infon key : file,   infon : ann
        infon key : type,   infon : Relation
        node role : Arg1,   refid : T3
        node role : Arg2,   refid : T4
    relation id : R3
        infon key : relation type,   infon : Interaction
        infon key : file,   infon : ann
        infon key : type,   infon : Relation
        node role : Arg1,   refid : T5
        node role : Arg2,   refid : T6
    relation id : R1
        infon key : relation type,   infon : Interaction
        infon key : file,   infon : ann
        infon key : type,   infon : Relation
        node role : Arg1,   refid : T1
        node role : Arg2,   refid : T2

see https://github.com/alexheretic/dynamics#xml-dynamics 参见https://github.com/alexheretic/dynamics#xml-dynamics

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM