简体   繁体   English

读取XML与读取CSV文件Java

[英]Reading XML vs reading CSV file java

What is quicker and better in performance? 什么是更快更好的性能?

Reading XML with DocumentBuilder or CSV with FileReader/BufferReader in Java? 在Java中使用DocumentBuilder读取XML还是使用FileReader / BufferReader读取CSV?

I don't know about performance, but one factor is ease of finding standard, well-used parsers. 我不了解性能,但是其中一个因素是易于找到标准的,使用良好的解析器。 There's an XML parser built into the JDK now, but I'm not aware of a CSV parser. 现在,JDK中内置了一个XML解析器,但是我不知道CSV解析器。 I think XML is far more ubiquitous than CSV. 我认为XML比CSV更加普及。

Another factor is the nature of the data: XML suggests hierarchical, while CSV suggests tables. 另一个因素是数据的性质:XML建议使用分层结构,而CSV建议使用表。 I think the "best" way to read in data depends more on something like this. 我认为读取数据的“最佳”方式更多地取决于这种情况。

While I can't speak to quicker builds and easy maintenance, nor performance; 虽然我无法说出更快的构建和轻松的维护,也无法提高性能。 though I'm guessing it really depends on HOW your using the documents being parsed; 尽管我猜测这实际上取决于您如何使用所解析的文档; eg reading document nodes would be way faster than csv, loading a document might be faster in CSV. 例如,读取文档节点会比csv更快,而以CSV格式加载文档可能会更快。 All that said, CSV is evil, meaning it's highly unstable data store. 综上所述,CSV是邪恶的,这意味着它是高度不稳定的数据存储。 XML has more overhead, but is way, way more stable. XML有更多的开销,但是确实更稳定。

RELATED_QUESTION: When and Why is XML preferable to CSV? 相关文章: 什么时候以及为什么XML比CSV更可取?

Reading a CSV file with the FileReader class is faster as the reader only reads the file and the parsing of the values is a quite easy step here. 使用FileReader类读取CSV文件的速度更快,因为读取器仅读取文件,并且解析值是一个非常简单的步骤。

On the other hand, reading an XML file using a DOMReader or SAXParser (you do not read documents using the builder class, it is used to create XML documents, as far as I know) is slower because the processing of XML data is a much more complicated step. 另一方面,使用DOMReaderSAXParser读取XML文件(据我所知,您不使用builder类读取文档,而是用于创建XML文档),因为处理XML数据非常繁琐更复杂的步骤。 XML files tend to be very verbose. XML文件通常非常冗长。

The advantage of the XML file is that you can put more stress to data validation (when using XSD for XML structure definition), ie testing the values for correctness when reading the file. XML文件的优点是,您可以更加重视数据验证(当使用XSD进行XML结构定义时),即在读取文件时测试值的正确性。 Also one can edit the XML file without any further explanations as the XML element names (and possible comments) say more than semi-colons in the CSV file. 另外,由于XML元素名称(和可能的注释)比CSV文件中的分号更重要,因此无需任何进一步说明即可编辑XML文件。

I agree with both blunders and duffymo. 我都同意大失误和达菲。 I just wanted to add the following. 我只想添加以下内容。

As it was already said both are the data format, so think about your data. 正如已经说过的,这两种都是数据格式,所以请考虑一下您的数据。 How large and how complicated is it? 它有多大?有多复杂? If it is hierarchical, forget about CSV. 如果它是分层的,则不用考虑CSV。 If it is not very large do the same. 如果不是很大,请执行相同操作。

Thinking about XML remember that DOM is not the only way to parse it. 考虑XML时,请记住DOM并不是解析它的唯一方法。 SAX is faster. SAX更快。 And you can use Digester (built on top of SAX) that allows you to define mapping between your data model and XML schema using XML and then runs very quickly. 而且,您可以使用Digester(基于SAX构建),该Digester允许您使用XML定义数据模型和XML模式之间的映射,然后非常快速地运行。

If you data very large and your parser must be very fast check JSON. 如果数据很大并且解析器必须非常快,请检查JSON。 It should be faster than XML because it is less verbose. 它应该比XML更快,因为它不那么冗长。

I've been wondering the same. 我一直在想同样的事情。 I just did a crude test using Excel to read and parse a simple file with 8,000 records. 我只是使用Excel进行了粗略的测试,以读取和解析具有8,000条记录的简单文件。 The XML load took ~8 seconds. XML加载大约需要8秒钟。 The CSV load took less than 1 second. CSV加载时间不到1秒。

I think that CSV is a perfectly valid choice for simple tabular data, and carries a lot less overhaed. 我认为CSV是用于简单表格数据的完全有效的选择,并且没有太多麻烦。 XML is GREAT, for more complex scenarios... XML非常适合用于更复杂的场景...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM