简体   繁体   中英

The best way to Compress XML

I need to compress a very large xml file to the smallest possible size.

I work in C#, and I prefer it to be some open source or application that I can access thru my code, but I can handle an algorithm as well.

Thank you!

It may not be the "smallest size possible", but you could use use System.IO.Compression to compress it. Zipping tends to provide very good compression for text.

using (var fileStream = File.OpenWrite(...))
using (var zipStream = new GZipStream(fileStream, CompressionMode.Compress))
{
    zipStream.Write(...);
}

As stated above, Efficient XML Interchange (EXI) achieves the best available XML compression pretty consistently. Even without schemas, it is not uncommon for EXI to be 2-5 times smaller than zip. With schemas, you'll do even better.

If you're not opposed to a commercial implementation, you can use the .NET version of Efficient XML and call it directly from your C# code using standard .NET APIs. You can download a free trial copy from http://www.agiledelta.com/efx_download.html .

If you have a schema available for the XML file, you could try EXIficient . It is an implementation of the Efficient XML Interchange (EXI) format that is pretty much the best available general-purpose XML compression method. If you don't have a schema, EXI is still better than regular zip (the deflate algorithm, that is), but not very much, especially for large files.

EXIficient is only Java but you can probably make it into an application that you can call. I'm not aware of any open-source implementations of EXI in C#.

看看XML压缩工具,你也可以使用SharpZipLib压缩它

File size is not the only advantage of EXI (or any binary scheme). The processing time and memory overhead are also greatly reduced when reading/writing it. Imagine a program that copies floating point numbers to disk by simply copying the bytes. Now imagine another program converts the floating point numbers to formatted text, and pastes them into a text stream, and then feeds that stream through an expensive compression algorithm. Because of this ridiculous overhead, XML is basically unusable for very large files that could have been effortlessly processed with a binary representation.

Binary XML promises to address this longstanding weakness of XML. It would be very easy to make a utility that converts between binary/text representations (without knowing the XML schema), which means you can still edit the files easily when you want to.

XML is highly compressible. You can use DotNetZip to produce compressed zip files from you XML.

if you require maximum compression level i would recommend LZMA. There is a SDK (including C#) that is part of the open source 7-Zip project, available here .

If you are looking for the smallest possible size then try Fast Infoset as binary XML encoding and then compress using BZIP2 or LZMA. You will probably get better results than compressing text XML or using EXI. FastInfoset.NET includes implementations of the Fast Infoset standard and several compression formats to choose from but it's commercial.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM