简体   繁体   English

压缩XML的最佳方法

[英]The best way to Compress XML

I need to compress a very large xml file to the smallest possible size. 我需要将一个非常大的xml文件压缩到尽可能小的大小。

I work in C#, and I prefer it to be some open source or application that I can access thru my code, but I can handle an algorithm as well. 我在C#工作,我更喜欢它是一些我可以通过我的代码访问的开源或应用程序,但我也可以处理一个算法。

Thank you! 谢谢!

It may not be the "smallest size possible", but you could use use System.IO.Compression to compress it. 它可能不是“可能的最小尺寸”,但您可以使用System.IO.Compression来压缩它。 Zipping tends to provide very good compression for text. 压缩倾向于为文本提供非常好的压缩。

using (var fileStream = File.OpenWrite(...))
using (var zipStream = new GZipStream(fileStream, CompressionMode.Compress))
{
    zipStream.Write(...);
}

As stated above, Efficient XML Interchange (EXI) achieves the best available XML compression pretty consistently. 如上所述,高效XML交换(EXI)可以非常一致地实现最佳的XML压缩。 Even without schemas, it is not uncommon for EXI to be 2-5 times smaller than zip. 即使没有架构,EXI也比拉链小2-5倍并不少见。 With schemas, you'll do even better. 使用模式,你会做得更好。

If you're not opposed to a commercial implementation, you can use the .NET version of Efficient XML and call it directly from your C# code using standard .NET APIs. 如果您不反对商业实现,则可以使用.NET版本的Efficient XML,并使用标准.NET API直接从C#代码中调用它。 You can download a free trial copy from http://www.agiledelta.com/efx_download.html . 您可以从http://www.agiledelta.com/efx_download.html下载免费试用版。

If you have a schema available for the XML file, you could try EXIficient . 如果您有可用于XML文件的架构,则可以尝试EXIficient It is an implementation of the Efficient XML Interchange (EXI) format that is pretty much the best available general-purpose XML compression method. 它是高效XML交换 (EXI)格式的一种实现,它几乎是最好的通用XML压缩方法。 If you don't have a schema, EXI is still better than regular zip (the deflate algorithm, that is), but not very much, especially for large files. 如果你没有架构,EXI仍然比常规zip(deflate算法)更好,但不是很多,特别是对于大文件。

EXIficient is only Java but you can probably make it into an application that you can call. EXIficient只是Java,但你可以把它变成你可以调用的应用程序。 I'm not aware of any open-source implementations of EXI in C#. 我不知道在C#中有任何EXI的开源实现。

看看XML压缩工具,你也可以使用SharpZipLib压缩它

File size is not the only advantage of EXI (or any binary scheme). 文件大小不是EXI(或任何二进制方案)的唯一优势。 The processing time and memory overhead are also greatly reduced when reading/writing it. 在读/写时,处理时间和内存开销也大大降低。 Imagine a program that copies floating point numbers to disk by simply copying the bytes. 想象一下,只需复制字节就可以将浮点数复制到磁盘上。 Now imagine another program converts the floating point numbers to formatted text, and pastes them into a text stream, and then feeds that stream through an expensive compression algorithm. 现在假设另一个程序将浮点数转换为格式化文本,并将它们粘贴到文本流中,然后通过昂贵的压缩算法提供该流。 Because of this ridiculous overhead, XML is basically unusable for very large files that could have been effortlessly processed with a binary representation. 由于这种荒谬的开销,XML基本上不能用于非常大的文件,这些文件本可以通过二进制表示轻松处理。

Binary XML promises to address this longstanding weakness of XML. 二进制XML有望解决XML长期存在的弱点。 It would be very easy to make a utility that converts between binary/text representations (without knowing the XML schema), which means you can still edit the files easily when you want to. 制作一个在二进制/文本表示之间进行转换的实用程序(不知道XML模式)非常容易,这意味着您仍然可以在需要时轻松编辑文件。

XML is highly compressible. XML具有高度可压缩性。 You can use DotNetZip to produce compressed zip files from you XML. 您可以使用DotNetZip从XML生成压缩的zip文件。

if you require maximum compression level i would recommend LZMA. 如果您需要最大压缩等级,我会推荐LZMA。 There is a SDK (including C#) that is part of the open source 7-Zip project, available here . 有一个SDK(包括C#)是开源7-Zip项目的一部分,可在此处获得

If you are looking for the smallest possible size then try Fast Infoset as binary XML encoding and then compress using BZIP2 or LZMA. 如果您正在寻找尽可能小的尺寸,请尝试使用Fast Infoset作为二进制XML编码,然后使用BZIP2或LZMA进行压缩。 You will probably get better results than compressing text XML or using EXI. 您可能会比压缩文本XML或使用EXI获得更好的结果。 FastInfoset.NET includes implementations of the Fast Infoset standard and several compression formats to choose from but it's commercial. FastInfoset.NET包括Fast Infoset标准的实现和几种压缩格式可供选择,但它是商业的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM