简体   繁体   English

使用Stax Parser将大于10GB的巨大xml文件拆分为小块

[英]Splitting huge xml file >10GB into small chunks using Stax Parser

We have Scenario where we need to split large xml file of size more than 10GB in small chunks. 在场景中,我们需要将大小超过10GB的大型xml文件分成小块。 Each chunk should contain 100 or 200 element. 每个块应包含100或200个元素。 Example xml 范例xml

<Employees>
  <Employee id="1">
    <age>29</age>
    <name>Pankaj</name>
    <gender>Male</gender>
    <role>Java Developer</role>
  </Employee>
  <Employee id="3">
    <age>35</age>
    <name>Lisa</name>
    <gender>Female</gender>
    <role>CEO</role>
  </Employee>
  <Employee id="3">
    <age>40</age>
    <name>Tom</name>
    <gender>Male</gender>
    <role>Manager</role>
  </Employee>
  <Employee id="3">
    <age>25</age>
    <name>Meghna</name>
    <gender>Female</gender>
    <role>Manager</role>
  </Employee>
  <Employee id="3">
    <age>29</age>
    <name>Pankaj</name>
    <gender>Male</gender>
    <role>Java Developer</role>
  </Employee>
  <Employee id="3">
    <age>35</age>
    <name>Lisa</name>
    <gender>Female</gender>
    <role>CEO</role>
  </Employee>
  <Employee id="3">
    <age>40</age>
    <name>Tom</name>
    <gender>Male</gender>
    <role>Manager</role>
 </Employee>
</Employees>

I have Stax parser code which will split file into small chunks. 我有Stax解析器代码,它将文件分成小块。 But each file contains only one complete Employee element, where I need 100 or 200 or more <Employee> elements in single file. 但是每个文件仅包含一个完整的Employee元素,在单个文件中我需要100或200或更多的<Employee>元素。 Here is my java code 这是我的java代码

public static void main(String[] s) throws Exception{
     String prefix = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n"+"\n";
        String suffix = "\n</Employees>\n";
        int count=0;
        try {

        int i=0;
             XMLInputFactory xif = XMLInputFactory.newInstance();
             XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("D:\\Desktop\\Test\\latestxml\\test.xml"));
             xsr.nextTag(); // Advance to statements element

             TransformerFactory tf = TransformerFactory.newInstance();
             Transformer t = tf.newTransformer();
             while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
                 File file = new File("C:\\Users\\test\\Desktop\\xml\\"+"out"  +i+ ".xml");
                 FileOutputStream fos=new FileOutputStream(file,true);
                 t.transform(new StAXSource(xsr), new StreamResult(fos));
                 i++;

             }

        } catch (Exception e) {
            e.printStackTrace();
        }

Do not put i with every iteration, it should be update with latest count when your iteration reach to 100 or 200 不要在每次迭代中都放置i,当迭代达到100或200时,它应该使用最新计数进行更新

Like: 喜欢:

String outputPath = "/test/path/foo.txt";

    while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {

                    FileOutputStream file = new FileOutputStream(outputPath,true);
                     ... 
                     ...
                     count ++; 
                     if(count == 100){
                      i++;
                      outputPath = "/test/path/foo"+i+"txt";
                      count = 0;
                      }  
                 }

i hope i get it right but you only need to increment count each time when you add one employer 我希望我做对了,但每次添加一位雇主时,您只需要递增计数

        File file = new File("out" + i + ".xml");
        FileOutputStream fos = new FileOutputStream(file, true);
        appendStuff("<Employees>",file);
        while (xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            count++;
            t.transform(new StAXSource(xsr), new StreamResult(fos));
            if(count == 100) {
                count = 0;
                i++;
                appendStuff("</Employees>",file);
                fos.close();
                file = new File("out" + i + ".xml");
                fos = new FileOutputStream(file, true);
                appendStuff("<Employees>",file);
            }
        }

Its not verly nice, but you get the idea 它不是很好,但是你明白了

private static void appendStuff(String content, File file) throws IOException {
    FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
    BufferedWriter bw = new BufferedWriter(fw);
    bw.write(content);
    bw.close();
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM