[英]Download and Unzip XML file
I would like to unzip and parse an xml file located here 我想解压缩并解析位于此处的xml文件
Here is my code: 这是我的代码:
HttpClientHandler handler = new HttpClientHandler()
{
CookieContainer = new CookieContainer(),
UseCookies = true,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate,
// | DecompressionMethods.None,
};
using (var http = new HttpClient(handler))
{
var response =
http.GetAsync(@"https://login.tradedoubler.com/report/published/aAffiliateEventBreakdownReportWithPLC_806880712_4446152766894956100.xml.zip").Result;
Stream streamContent = response.Content.ReadAsStreamAsync().Result;
using (var gZipStream = new GZipStream(streamContent, CompressionMode.Decompress))
{
var settings = new XmlReaderSettings()
{
DtdProcessing = DtdProcessing.Ignore
};
var reader = XmlReader.Create(gZipStream, settings);
reader.MoveToContent();
XElement root = XElement.ReadFrom(reader) as XElement;
}
}
I get an exception on XmlReader.Create(gZipStream, settings) 我在XmlReader.Create(gZipStream,settings)上遇到异常
The magic number in GZip header is not correct. GZip标头中的幻数不正确。 Make sure you are passing in a GZip stream 确保您正在传递GZip流
To double check that I am getting properly formatted data from the web, I grab the stream and save it to a file: 为了再次检查我是否正在从网上获取格式正确的数据,我抓取了流并将其保存到文件中:
byte[] byteContent = response.Content.ReadAsByteArrayAsync().Result;
File.WriteAllBytes(@"C:\\temp\1111.zip", byteContent);
After I examine 1111.zip, it appears as a well formatted zip file with the xml that I need. 在检查1111.zip之后,它显示为带有我所需xml的格式正确的zip文件。
I was advised here that I do not need GZipStream at all but if I remove compression stream from the code completely, and pass streamContent directly to xml reader, I get an exception: 在这里我被告知我根本不需要GZipStream,但是如果我从代码中完全删除压缩流,并将streamContent直接传递给xml reader,则会出现异常:
"Data at the root level is invalid. Line 1, position 1." “根级别的数据无效。第1行的位置1。”
Either compressed or not compressed, I still fail to parse this file. 无论压缩还是未压缩,我仍然无法解析该文件。 What am I doing wrong? 我究竟做错了什么?
After you save stream to local folder, unzip it with ZipFile class. 将流保存到本地文件夹后,请使用ZipFile类将其解压缩。 Something like this: 像这样:
byte[] byteContent = response.Content.ReadAsByteArrayAsync().Result;
string filename = @"C:\temp\1111.zip";
File.WriteAllBytes(filename, byteContent);
string destinationDir = @"c:\temp";
string xmlFilename = "report.xml";
System.IO.Compression.ZipFile.ExtractToDirectory(filename, destinationDir);
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(Path.Combine(destinationDir, xmlFilename));
//xml reading goes here...
The file in question is encoded in PKZip format, not GZip format. 有问题的文件以PKZip格式而不是GZip格式编码。
You'll need a different library to decompress it, such as System.IO.Compression.ZipFile . 您需要一个不同的库来解压缩它,例如System.IO.Compression.ZipFile 。
You can typically tell the encoding by the file extension. 通常,您可以通过文件扩展名告诉编码。 PKZip files often use .zip
while GZip files often use .gz
. PKZip文件通常使用.zip
而GZip文件经常使用.gz
。
See: Unzip files programmatically in .net 请参阅: 在.net中以编程方式解压缩文件
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.