[英]Split 1GB Xml file using Java
I have a 1GB Xml file, how can I split it into well-formed, smaller size Xml files using Java ? 我有一个1GB的Xml文件,如何使用Java将其拆分为格式正确的小型Xml文件?
Here is an example: 这是一个例子:
<records>
<record id="001">
<name>john</name>
</record>
....
</records>
Thanks. 谢谢。
I would use a StAX parser for this situation. 我会在这种情况下使用StAX解析器。 It will prevent the entire document from being read into memory at one time.
它将阻止整个文档一次被读入内存。
Code Example 代码示例
For the following XML, output each "statement" section into a file named after the "account attributes value": 对于以下XML,将每个“statement”部分输出到以“account attributes value”命名的文件中:
<statements>
<statement account="123">
...stuff...
</statement>
<statement account="456">
...stuff...
</statement>
</statements>
This can be done with the following code: 这可以使用以下代码完成:
import java.io.File;
import java.io.FileReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamConstants;
import javax.xml.stream.XMLStreamReader;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("input.xml"));
xsr.nextTag(); // Advance to statements element
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
File file = new File("out/" + xsr.getAttributeValue(null, "account") + ".xml");
t.transform(new StAXSource(xsr), new StreamResult(file));
}
}
}
Try this, using Saxon-EE 9.3. 试试这个,使用Saxon-EE 9.3。
<xsl:stylesheet version="3.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:mode streamable="yes"/>
<xsl:template match="record">
<xsl:result-document href="record-{@id}.xml">
<xsl:copy-of select="."/>
</xsl:result-document>
</xsl:template>
</xsl:stylesheet>
The software isn't free, but if it saves you a day's coding you can easily justify the investment. 该软件不是免费的,但如果它为您节省了一天的编码,您可以轻松证明投资的合理性。 (Apologies for the sales pitch).
(对销售宣传道歉)。
DOM , STax, SAX all will do but have there own pros and cons. DOM,STax,SAX都会做,但有自己的优点和缺点。
Hope this helps 希望这可以帮助
I respectfully disagree with Blaise Doughan. 我恭敬地不同意Blaise Doughan。 SAX is not only hard to use, but very slow.
SAX不仅难以使用,而且非常慢。 With VTD-XML, you can not only use XPath to simplify processing logic (10x code reduction very common) but also much faster because there is no redundant encoding/decoding conversion.
使用VTD-XML,您不仅可以使用XPath来简化处理逻辑(10倍代码减少非常常见),而且还可以更快,因为没有冗余编码/解码转换。 Below is the java code with vtd-xml
下面是带有vtd-xml的java代码
import java.io.FileOutputStream;
import com.ximpleware.*;
public class split {
public static void main(String[] args) throws Exception {
VTDGen vg = new VTDGen();
if (vg.parseHttpUrl("c:\\xml\\input.xml", true)) {
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
ap.selectXPath("/records/record");
int i=-1,j=0;
while ((i = ap.evalXPath()) != -1) {
long l=vn.getElementFragment();
(new FileOutputStream("out"+j+".xml")).write(vn.getXML().getBytes(), (int)l,(int)(l>>32));
j++;
}
}
}
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.