简体   繁体   English

使用Java拆分1GB Xml文件

[英]Split 1GB Xml file using Java

I have a 1GB Xml file, how can I split it into well-formed, smaller size Xml files using Java ? 我有一个1GB的Xml文件,如何使用Java将其拆分为格式正确的小型Xml文件?

Here is an example: 这是一个例子:

<records>
  <record id="001">
    <name>john</name>
  </record>
 ....
</records>

Thanks. 谢谢。

I would use a StAX parser for this situation. 我会在这种情况下使用StAX解析器。 It will prevent the entire document from being read into memory at one time. 它将阻止整个文档一次被读入内存。

  1. Advance the XMLStreamReader to the local root element of the sub-fragment. 将XMLStreamReader推进到子片段的本地根元素。
  2. You can then use the javax.xml.transform APIs to produce a new document from this XML fragment. 然后,您可以使用javax.xml.transform API从此XML片段生成新文档。 This will advance the XMLStreamReader to the end of that fragment. 这会将XMLStreamReader推进到该片段的末尾。
  3. Repeat step 1 for the next fragment. 对下一个片段重复步骤1。

Code Example 代码示例

For the following XML, output each "statement" section into a file named after the "account attributes value": 对于以下XML,将每个“statement”部分输出到以“account attributes value”命名的文件中:

<statements>
   <statement account="123">
      ...stuff...
   </statement>
   <statement account="456">
      ...stuff...
   </statement>
</statements>

This can be done with the following code: 这可以使用以下代码完成:

import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;

public class Demo {

    public static void main(String[] args) throws Exception  {
        XMLInputFactory xif = XMLInputFactory.newInstance();
        XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
        xsr.nextTag(); // Advance to statements element

        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer t = tf.newTransformer();
        while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
            File file = new File("out/" + xsr.getAttributeValue(null, "account") + ".xml");
            t.transform(new StAXSource(xsr), new StreamResult(file));
        }
    }

} 

Try this, using Saxon-EE 9.3. 试试这个,使用Saxon-EE 9.3。

<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:mode streamable="yes"/>
    <xsl:template match="record">
      <xsl:result-document href="record-{@id}.xml">
        <xsl:copy-of select="."/>
      </xsl:result-document>
    </xsl:template>
</xsl:stylesheet>

The software isn't free, but if it saves you a day's coding you can easily justify the investment. 该软件不是免费的,但如果它为您节省了一天的编码,您可以轻松证明投资的合理性。 (Apologies for the sales pitch). (对销售宣传道歉)。

DOM , STax, SAX all will do but have there own pros and cons. DOM,STax,SAX都会做,但有自己的优点和缺点。

  1. You can't put all the data in-memory in case of DOM. 在DOM的情况下,您不能将所有数据都放在内存中。
  2. Programming control is easier in case of DOM then Stax and then SAX. 对于DOM,然后是Stax,然后是SAX,编程控制更容易。
  3. A combination of SAX and DOM is a better option. SAX和DOM的组合是更好的选择。
  4. Using a Framework which already does this can be the best option. 使用已经完成此操作的框架可能是最佳选择。 Have a look at smooks. 看看smooks。 http://www.smooks.org http://www.smooks.org

Hope this helps 希望这可以帮助

I respectfully disagree with Blaise Doughan. 我恭敬地不同意Blaise Doughan。 SAX is not only hard to use, but very slow. SAX不仅难以使用,而且非常慢。 With VTD-XML, you can not only use XPath to simplify processing logic (10x code reduction very common) but also much faster because there is no redundant encoding/decoding conversion. 使用VTD-XML,您不仅可以使用XPath来简化处理逻辑(10倍代码减少非常常见),而且还可以更快,因为没有冗余编码/解码转换。 Below is the java code with vtd-xml 下面是带有vtd-xml的java代码

import java.io.FileOutputStream;
import com.ximpleware.*; 

public class split {
    public static void main(String[] args) throws Exception {       
        VTDGen vg = new VTDGen();       
        if (vg.parseHttpUrl("c:\\xml\\input.xml", true)) {
            VTDNav vn = vg.getNav();
            AutoPilot ap = new AutoPilot(vn);
            ap.selectXPath("/records/record");
            int i=-1,j=0;
            while ((i = ap.evalXPath()) != -1) {
            long l=vn.getElementFragment();
                (new FileOutputStream("out"+j+".xml")).write(vn.getXML().getBytes(), (int)l,(int)(l>>32));
                j++;
            }
        }
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM