简体   繁体   English

XML 与逗号分隔的文本文件

[英]XML vs comma delimited text files

Ok, I've read a couple books on XML and wrote programs to spit it out and what not.好的,我已经阅读了几本关于 XML 的书籍,并编写了一些程序来吐出它,还有什么不是。 But here's the question.但问题来了。 Both a comma delimited file and a XML file are "human readable."逗号分隔文件和 XML 文件都是“人类可读的”。 But in general, the comma delimited file is much easier on my eyes than a XML file;但总的来说,逗号分隔的文件在我看来比 XML 文件容易得多; the tags typically take up as much if not more space than the data.标签通常占用的空间与数据一样多,甚至更多。 This just seems to obscure what I'm reading and the format can take a page to contain the same information that you can contain on a single line of text in a comma delimited file.这似乎掩盖了我正在阅读的内容,并且该格式可以占用一页来包含您可以在逗号分隔文件中的单行文本中包含的相同信息。 And a comma delimited file is significantly less complex to parse.逗号分隔的文件解析起来要简单得多。 So the real question is why XML?所以真正的问题是为什么是 XML? Just because all the cool kids are doing it?仅仅因为所有酷孩子都在做吗?

Advantages优点

A number of advantages XML has over CSV:与 CSV 相比,XML 有许多优点:

  • Hierarchical data organization分层数据组织
  • Automatic data validation (XML Schemas or DTDs)自动数据验证(XML 模式或 DTD)
  • Easily convert formats (using XSL)轻松转换格式(使用 XSL)
  • Easy to identify relational structure易于识别的关系结构
  • Can be used in combination with XML-RPC可与 XML-RPC 结合使用
  • Suitable for object persistence (marshalling)适用于对象持久化(编组)
  • Simplifies business-to-business communications简化企业对企业的通信
  • Helpful related technologies (XPath, DOM)有用的相关技术(XPath、DOM)
  • Tight integration with modern Web browsers与现代 Web 浏览器紧密集成
  • Extract, Transform, and Load (ETL) tools提取、转换和加载 (ETL) 工具
  • Backwards file format compatibility (version attribute)向后文件格式兼容性(版本属性)
  • Digital signatures数字签名

It completely depends on the problem domain and what you are trying to solve.这完全取决于问题域以及您要解决的问题。

Example例子

The last item is something that many people miss when writing web pages.最后一项是很多人在编写网页时会错过的东西。 Consider the situation where you have a large data store of songs.考虑一下您拥有大量歌曲数据存储的情况。 Songs have artists, albums, beats per minute, and so forth.歌曲有艺术家、专辑、每分钟节拍等。 You could export the data to XML, write a simple stylesheet to render the XML as XHTML, then point the browser at the XML page.您可以将数据导出为 XML,编写一个简单的样式表以将 XML 呈现为 XHTML,然后将浏览器指向 XML 页面。 The browser will render the XML as a web page.浏览器会将 XML 呈现为网页。

You cannot do that with CSV.你不能用 CSV 做到这一点。

Disadvantages缺点

Joel Spolsky has a great article on why XML is a poor choice as a complex data store: it is slow. Joel Spolsky 有一篇很棒的文章,说明为什么 XML 作为复杂的数据存储是一个糟糕的选择:它很慢。 (Unlike a database, which can retrieve previous or next records with a single CPU instruction, traversing records in an XML document is much slower.) Arguably, this could be considered an optimization problem, resolved by waiting 18 months . (与可以使用单个 CPU 指令检索上一条或下一条记录的数据库不同,在 XML 文档中遍历记录要慢得多。)可以说,这可以被视为优化问题,需要等待 18 个月才能解决。 Thus:因此:

  • Slower to parse than other formats解析速度比其他格式慢
  • Syntactical redundancy can detract from readability语法冗余会降低可读性
  • Document bloat could affect storage costs文档膨胀可能会影响存储成本
  • Cannot easily model overlapping (non-hierarchical) data structures无法轻松建模重叠(非分层)数据结构
  • Poorly designed XML file formats are not uncommon (in my experience; citation needed)设计不佳的 XML 文件格式并不少见(根据我的经验;需要引用)

Related Question相关问题

See also: Why Should I Use A Human Readable File Format .另请参阅: 为什么我应该使用人类可读的文件格式

These aren't the only two options, you can also use JSON or YAML which are much lighter weight than xml.这不是仅有的两个选项,您还可以使用重量比 xml 轻得多的JSONYAML

In general, if you have simple tabular data with out many special characters, CSV isn't a bad choice.一般来说,如果您有没有许多特殊字符的简单表格数据,CSV 不是一个糟糕的选择。 For structured data, consider using one of the other 3.对于结构化数据,请考虑使用其他 3 种之一。

XML supports complex, structured and hierarchical representation of things. XML 支持事物的复杂、结构化和分层表示。 That's far from what CSV can store trivially.这远不是 CSV 可以简​​单存储的内容。

Think about a complex object graph in an object oriented environment.考虑面向对象环境中的复杂对象图。 It can be serialized as an XML document pretty easily but CSV cannot handle such a thing.它可以很容易地序列化为 XML 文档,但 CSV 无法处理这样的事情。

It all depends on what you need to do.这一切都取决于您需要做什么。 If you need more complexity in your data structures than a simple "flat" row structure can give.如果您需要比简单的“平面”行结构更复杂的数据结构。 for example hierarchical data, then XML is a great choice.例如分层数据,那么 XML 是一个很好的选择。

Well XML is human readable and human editable. XML 是人类可读和人类可编辑的。 You can look at an XML file and know exactly what it is.您可以查看一个 XML 文件并确切地知道它是什么。 A CSV file is human readable but you don't really know what each value means at all. CSV 文件是人类可读的,但您根本不知道每个值的含义。

For example, if we're storing user accounts, which would you prefer?例如,如果我们要存储用户帐户,您更喜欢哪个?

<user>
    <username>ryeguy</username>
    <password>abc123</password>
    <regdate>3-4-08</regdate>
    <email>my@email.com</email>
</user>

OR要么

ryeguy,abc123,3-4-08,my@email.com

Of course, this is just an example, but imagine it with 30 fields or so!当然,这只是一个例子,但想象一下它有 30 个左右的字段!

Or worse yet, what if we make subfields?或者更糟糕的是,如果我们创建子字段怎么办?

<user>
    <username>ryeguy</username>
    <password>abc123</password>
    <regdate>3-4-08</regdate>
    <email>my@email.com</email>
    <posts>
        <post>
            <id>34</id>
            ....
        </post>
    </posts>
</user>

That would be a pain in the ass to put in a CSV.放入 CSV 会很麻烦。 Soon you'd be making your own querying language.很快您就会制作自己的查询语言。

The fact that XML is human readable does not mean that has been made with the idea of having it read (or even edited) directly by humans. XML 是人类可读的这一事实并不意味着它是由人类直接阅读(甚至编辑)的想法制成的。

XML has a nice set of properties that make it a good choice for many cases, in particular when you have the human resources to deal with the additional burden that such properties inevitably bring in: validation, well defined standard, a lot of tools, a very flexible architecture, it maps nicely to a tree model, which is what many programs use. XML 有一组很好的属性,使其成为许多情况下的不错选择,特别是当您有人力资源来处理此类属性不可避免地带来的额外负担时:验证、定义良好的标准、大量工具、非常灵活的架构,它很好地映射到树模型,这是许多程序使用的。 Its human readability is an added value that simplifies debugging (try to do debugging of a binary file...), inspection and small changes for trivial cases.它的人类可读性是一个附加值,它简化了调试(尝试对二进制文件进行调试......)、检查和对琐碎情况的小改动。

CSV on the other hand is easy, quick and linear, although many dialects exist, and parsing it well is far from trivial (and with the added problem that it looks trivial!).另一方面,CSV 简单、快速和线性,尽管存在许多方言,并且很好地解析它远非微不足道(并且附加的问题看起来微不足道!)。 For most applications involving table of data, CSV is the perfect choice.对于大多数涉及数据表的应用程序,CSV 是完美的选择。

In general, however, there are cases of data representation you can solve with XML but you cannot solve with CSV (for example, a tree).但是,一般情况下,有些数据表示可以用 XML 解决,但不能用 CSV 解决(例如,树)。 On the other hand, any data that can be represented in CSV can also be represented in XML, although it's not guaranteed (and indeed is also verified) that it will be more efficient (in terms of space, ease of parsing etc).另一方面,任何可以用 CSV 表示的数据也可以用 XML 表示,尽管不能保证(实际上也经过验证)它会更有效(在空间、易于解析等方面)。 It's a matter of "degrees of freedom" of your format.这是格式的“自由度”问题。 XML has a higher value of degree of freedom. XML 具有更高的自由度值。 CSV is lower. CSV 较低。 The hype behind XML is also relative to this fact. XML 背后的炒作也与这一事实有关。

Don't fall victim of the hammer syndrome: when you have a hammer (XML), everything looks like a nail (something that you have to solve with XML).不要成为锤子综合症的受害者:当您拥有锤子 (XML) 时,一切看起来都像钉子(必须使用 XML 解决的问题)。 Reality is much different and nuanced.现实有很大的不同,也很微妙。 XML is cool, but it's not the answer to any problem. XML 很酷,但它不是任何问题的答案。

CSV was never really a standard. CSV 从来都不是真正的标准。 Just the same quick and dirty method a bunch of people came up with independently.只是一群人独立提出的同样快速和肮脏的方法。 Of course, some of these people were smarter than others and realized you needed to escape characters but others didn't.当然,其中一些人比其他人更聪明,并意识到您需要转义字符,但其他人没有。 Even MSSQL exports CSVs improperly.甚至 MSSQL 也不能正确导出 CSV。 There is a documented RIGHT way to doing XML so if you're doing it right and someone's application or whatever isn't accepting it you have some clout when you say "That's not my fault."有一种记录在案的正确方式来处理 XML,所以如果你做对了,而某人的应用程序或任何不接受它的人,当你说“那不是我的错”时,你就有一定的影响力。

XML will describe the content and also has a ton of supporting libraries in a variety of languages... but it can be bloated. XML 将描述内容,并且还有大量支持各种语言的库……但它可能会变得臃肿。 If the receiving end of the csv is aware of the layout and it is tabular, I don't see anything wrong with it.如果 csv 的接收端知道布局并且它是表格的,我看不出有什么问题。

Xml 可以根据合同(模式或 DTD)进行验证。

XML 也有围绕它的免费技术:XmlDom、XPath、XSLT、XSD、Xml Schemas

Among the reasons you may prefer XML over CSV (depends on the task at hand of course): * Almost all platforms and languages have existing libraries for reading, writing, parsing, and manipulating XML.您可能更喜欢 XML 而不是 CSV 的原因之一(当然取决于手头的任务): * 几乎所有平台和语言都有用于读取、写入、解析和操作 XML 的现有库。 * XML has well-defined rules for encoding all characters. * XML 具有用于编码所有字符的明确定义的规则。 CSV has ambiguities such as how to encode commas that are part of the data. CSV 具有歧义,例如如何对作为数据一部分的逗号进行编码。 * XML supports a variety of data shapes (like hierarchical) where as CSV is most useful when the data looks like a table (rows and columns). * XML 支持各种数据形状(如分层),其中当数据看起来像表格(行和列)时,CSV 最有用。

I like to think of the primary distinction in this case as XML is TREE based, while CSV is TABLE-based.我喜欢在这种情况下考虑主要区别,因为 XML 是基于 TREE 的,而 CSV 是基于 TABLE 的。

That is, you can nest and re-nest and omit and generally make a complex TREE structure in XML, whereas you can only make simple 2D tables in CSV.也就是说,您可以在 XML 中嵌套和重新嵌套和省略并且通常制作复杂的 TREE 结构,而您只能在 CSV 中制作简单的 2D 表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM