简体   繁体   English

下载并解压缩XML文件

[英]Download and Unzip XML file

I would like to unzip and parse an xml file located here 我想解压缩并解析位于此处的xml文件

Here is my code: 这是我的代码:

HttpClientHandler handler = new HttpClientHandler()
{
    CookieContainer = new CookieContainer(),
    UseCookies = true,
    AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate,
   // | DecompressionMethods.None,

};

using (var http = new HttpClient(handler))
{

    var response =
         http.GetAsync(@"https://login.tradedoubler.com/report/published/aAffiliateEventBreakdownReportWithPLC_806880712_4446152766894956100.xml.zip").Result;

    Stream streamContent = response.Content.ReadAsStreamAsync().Result;

    using (var gZipStream = new GZipStream(streamContent, CompressionMode.Decompress))
    {
        var settings = new XmlReaderSettings()
        {
             DtdProcessing = DtdProcessing.Ignore
         };

         var reader = XmlReader.Create(gZipStream, settings);
         reader.MoveToContent();

         XElement root = XElement.ReadFrom(reader) as XElement;
     }
}

I get an exception on XmlReader.Create(gZipStream, settings) 我在XmlReader.Create(gZipStream,settings)上遇到异常

The magic number in GZip header is not correct. GZip标头中的幻数不正确。 Make sure you are passing in a GZip stream 确保您正在传递GZip流

To double check that I am getting properly formatted data from the web, I grab the stream and save it to a file: 为了再次检查我是否正在从网上获取格式正确的数据,我抓取了流并将其保存到文件中:

byte[] byteContent = response.Content.ReadAsByteArrayAsync().Result;
File.WriteAllBytes(@"C:\\temp\1111.zip", byteContent);

After I examine 1111.zip, it appears as a well formatted zip file with the xml that I need. 在检查1111.zip之后,它显示为带有我所需xml的格式正确的zip文件。

I was advised here that I do not need GZipStream at all but if I remove compression stream from the code completely, and pass streamContent directly to xml reader, I get an exception: 这里我被告知我根本不需要GZipStream,但是如果我从代码中完全删除压缩流,并将streamContent直接传递给xml reader,则会出现异常:

"Data at the root level is invalid. Line 1, position 1." “根级别的数据无效。第1行的位置1。”

Either compressed or not compressed, I still fail to parse this file. 无论压缩还是未压缩,我仍然无法解析该文件。 What am I doing wrong? 我究竟做错了什么?

After you save stream to local folder, unzip it with ZipFile class. 将流保存到本地文件夹后,请使用ZipFile类将其解压缩。 Something like this: 像这样:

    byte[] byteContent = response.Content.ReadAsByteArrayAsync().Result;
    string filename = @"C:\temp\1111.zip";
    File.WriteAllBytes(filename, byteContent);

    string destinationDir = @"c:\temp";
    string xmlFilename = "report.xml";

    System.IO.Compression.ZipFile.ExtractToDirectory(filename, destinationDir);

    XmlDocument xmlDoc = new XmlDocument();
    xmlDoc.Load(Path.Combine(destinationDir, xmlFilename));

    //xml reading goes here...

The file in question is encoded in PKZip format, not GZip format. 有问题的文件以PKZip格式而不是GZip格式编码。

You'll need a different library to decompress it, such as System.IO.Compression.ZipFile . 您需要一个不同的库来解压缩它,例如System.IO.Compression.ZipFile

You can typically tell the encoding by the file extension. 通常,您可以通过文件扩展名告诉编码。 PKZip files often use .zip while GZip files often use .gz . PKZip文件通常使用.zip而GZip文件经常使用.gz

See: Unzip files programmatically in .net 请参阅: 在.net中以编程方式解压缩文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM