简体   繁体   English

使用java将XML文件转换为CSV文件

[英]Convert an XML file to CSV file using java

I need help understanding the steps involved in converting an XML file into a CSV file using java. 我需要帮助了解使用java将XML文件转换为CSV文件所涉及的步骤。 Here is an example of an XML file 这是一个XML文件的示例

<?xml version="1.0"?>
<Sites>
<Site id="101" name="NY-01" location="New York">
    <Hosts>
        <Host id="1001">
           <Host_Name>srv001001</Host_Name>
           <IP_address>10.1.2.3</IP_address>
           <OS>Windows</OS>
           <Load_avg_1min>1.3</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1002">
           <Host_Name>srv001002</Host_Name>
           <IP_address>10.1.2.4</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>1.4</Load_avg_1min>
           <Load_avg_5min>2.5</Load_avg_5min>
           <Load_avg_15min>1.2</Load_avg_15min>
        </Host>
        <Host id="1003">
           <Host_Name>srv001003</Host_Name>
           <IP_address>10.1.2.5</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>3.3</Load_avg_1min>
           <Load_avg_5min>1.6</Load_avg_5min>
           <Load_avg_15min>1.8</Load_avg_15min>
        </Host>
        <Host id="1004">
           <Host_Name>srv001004</Host_Name>
           <IP_address>10.1.2.6</IP_address>
           <OS>Linux</OS>
           <Load_avg_1min>2.3</Load_avg_1min>
           <Load_avg_5min>4.5</Load_avg_5min>
           <Load_avg_15min>4.2</Load_avg_15min>
        </Host>     
    </Hosts>
</Site>
</Sites>

and here is the resulting CSV file. 这是生成的CSV文件。

site_id, site_name, site_location, host_id, host_name, ip_address, operative_system, load_avg_1min, load_avg_5min, load_avg_15min
101, NY-01, New York, 1001, srv001001, 10.1.2.3, Windows, 1.3, 2.5, 1.2
101, NY-01, New York, 1002, srv001002, 10.1.2.4, Linux, 1.4, 2.5, 1.2
101, NY-01, New York, 1003, srv001003, 10.1.2.5, Linux, 3.3, 1.6, 1.8
101, NY-01, New York, 1004, srv001004, 10.1.2.6, Linux, 2.3, 4.5, 4.2

I was thinking of using a DOM parser to read the xml file. 我在考虑使用DOM解析器来读取xml文件。 The problem I have with that is I would need to specify specific elements in to code by name, but I want it to be able to parse it without doing that. 我遇到的问题是我需要按名称指定代码中的特定元素,但我希望它能够在不执行此操作的情况下解析它。

Are there any tools or libraries in java that would be able to help me achieve this. java中是否有任何工具或库可以帮助我实现这一目标。

If I have a XML file of this format below and want to add the value of the InitgPty in the same row with MSgId (Pls note :InitgPty is in the next tag level, so it prints the value in the next row) 如果我有一个下面这种格式的XML文件,并希望在MSgId的同一行中添加InitgPty的值(请注意:InitgPty在下一个标签级别,所以它在下一行打印值)

<?xml version="1.0"?>
<CstmrCdtTrfInitn>
<GrpHdr>
<MsgId>XYZ07/ABC</MsgId>
<NbOfTxs>100000</NbOfTxs>
<InitgPty>
<Nm>XYZ</Nm>
</InitgPty>

here's a working example, data.xml has your data: 这是一个工作示例, data.xml包含您的数据:

import java.io.File;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

class Xml2Csv {

    public static void main(String args[]) throws Exception {
        File stylesheet = new File("src/main/resources/style.xsl");
        File xmlSource = new File("src/main/resources/data.xml");

        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document document = builder.parse(xmlSource);

        StreamSource stylesource = new StreamSource(stylesheet);
        Transformer transformer = TransformerFactory.newInstance()
                .newTransformer(stylesource);
        Source source = new DOMSource(document);
        Result outputTarget = new StreamResult(new File("/tmp/x.csv"));
        transformer.transform(source, outputTarget);
    }
}

style.xsl style.xsl

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format" >
<xsl:output method="text" omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
<xsl:for-each select="//Host">
<xsl:value-of select="concat(Host_Name,',',IP_address,',',OS,Load_avg_1min,',',Load_avg_5min,',',Load_avg_15min,'&#xA;')"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

output: 输出:

Host_Name,IP_address,OS,Load_avg_1min,Load_avg_5min,Load_avg_15min
srv001001,10.1.2.3,Windows1.3,2.5,1.2
srv001002,10.1.2.4,Linux1.4,2.5,1.2
srv001003,10.1.2.5,Linux3.3,1.6,1.8
srv001004,10.1.2.6,Linux2.3,4.5,4.2

Three steps: 三个步骤:

  1. Parse the XML file into a java XML library object. 将XML文件解析为java XML库对象。
  2. Retrieve relevant data from the object for each row. 从每个行的对象中检索相关数据
  3. Write the results to a text file using native java functions , saving with *.csv extension. 使用本机java函数将结果写入文本文件,使用 * .csv扩展名保存。

Your best best is to use XSLT to "transform" the XML to CSV. 您最好的方法是使用XSLT将XML“转换”为CSV。 There are some Q/As on so (like here ) that cover how to do this. 有一些问答(如此 ),涵盖了如何做到这一点。 The key is to provide a schema for your source data so the XSLT transform process knows how to read it so it can properly format the results. 关键是为源数据提供模式,以便XSLT转换过程知道如何读取它,以便正确格式化结果。

Then you can use Xalan to input the XML, read the XSLT and output your results. 然后,您可以使用Xalan输入XML,读取XSLT并输出结果。

The answer has already been provided by Pedantic (using the DOM-like approach {Document Object Model}) and Jono (with the SAX-like approach this time) in January. 答案已经由1月份的Pedantic(使用类似DOM的方法{文档对象模型})和Jono(这次使用类似SAX的方法)提供。

My opinion is that both methods work well for small files but the latter works better with big XML files. 我的观点是两种方法都适用于小文件,但后者适用于大型XML文件。 You didn't mention the actual size of your XML files but you should take this into account. 您没有提到XML文件的实际大小,但您应该考虑到这一点。

Whatever method is used a specific program (which would detect special tags tailored to your local XML) will be easier to write but won't work without code adaptations for another XML flavor, while a more generic program will be harder to devise but will work for all XML files. 无论使用何种方法,特定程序(将检测为您的本地XML定制的特殊标记)将更容易编写,但如果没有代码适应另一种XML风格将无法工作,而更通用的程序将更难设计但将工作对于所有XML文件。 You said you wanted to be able to parse a file without specifying specific element names so I guess the generic approach is what you prefer, and I agree with that, but please note that it's easier said than done. 你说你希望能够在不指定特定元素名称的情况下解析文件,所以我想通用方法是你喜欢的,我同意这一点,但请注意,说起来容易做起来难。 Indeed, I had the same problem on january too, implying this time a big XML file (>>100Mo) and I was surprised that nothing was available over the Internet so far. 事实上,我在1月也有同样的问题,这意味着这次是一个大的XML文件(>> 100Mo),我很惊讶到目前为止在互联网上没有任何东西可用。 Turning frustration into something better is always a good thing so I decided to deal with that specific problem in the most generic way by myself, with a special concern for the big-XML-file-issue . 将挫折转化为更好的事情总是一件好事,所以我决定自己以最通用的方式处理该特定问题,特别关注大XML文件问题

You might be interested to know that the generic Java library I wrote, which is now published as free software, converted your XML file into CSV the way you expected (in -x -u mode {please refer to the documentation for further information}). 您可能有兴趣知道我编写的通用Java库(现在作为自由软件发布)将您的XML文件按照预期的方式转换为CSV(在-x -u模式中{请参阅文档以获取更多信息}) 。

So the answer to the last part of your question is: yes, there is at least one library which will help you achieve your goal, mine, which is named "XML2CSV-Generic-Converter". 所以你问题的最后一部分的答案是:是的,至少有一个库可以帮助你实现你的目标,我的名字叫做“XML2CSV-Generic-Converter”。 There might be other ones of course, and better ones certainly, but I couldn't pick any decent (free) one by myself. 当然可能还有其他的,当然也有更好的,但我不能自己挑选任何体面的(免费的)。

I won't provide any link here to comply with Peter Foti 's judicious remark - but if you key "XML2CSV-Generic-Converter" in your favorite search engine you should find it easily. 我不会在这里提供任何链接以遵守Peter Foti的明智评论 - 但如果您在自己喜欢的搜索引擎中键入“XML2CSV-Generic-Converter”,您应该很容易找到它。

your file looks really flat and simple. 你的文件看起来非常扁平和简单。 You don't necessarily need an XML parser to convert it. 您不一定需要XML解析器来转换它。 Just parse it with LineNumberReader.readLine() and use regexp to extract specific fields. 只需使用LineNumberReader.readLine()解析它,并使用regexp提取特定字段。

Another option is to use StAX , a streaming API for XML processing. 另一种选择是使用StAX ,一种用于XML处理的流API。 It's pretty simple and you don't need to load the whole document in RAM. 它非常简单,您无需在RAM中加载整个文档。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM