下载HTML页面并将其编码为文件

Question

I like to download some web pages which use charset="UTF-8" 我喜欢下载一些使用charset =“ UTF-8”的网页
This page is a sample: http://en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_2003 此页面是一个示例： http : //en.wikipedia.org/wiki/Billboard_Year-End_Hot_100_singles_of_2003
I always end up with special characters like this: BeyoncÃ© instead of Beyoncé 我总是以这样的特殊字符结尾：Beyoncé©而不是Beyoncé
I tried the following code: 我尝试了以下代码：

WebClient webClient = new WebClient();
webClient.Encoding = System.Text.Encoding.UTF8;
webClient.DownloadFile(url, fileName);

or this one: 或这一个：

WebClient client = new WebClient();
Byte[] pageData = client.DownloadData(url);
string pageHtml = Encoding.UTF8.GetString(pageData);
System.IO.File.WriteAllText(fileName, pageHtml);

What do I do wrong? 我做错了什么？
I just want an easy way to download web pages and write them to files. 我只想要一种简单的方法来下载网页并将其写入文件。 After that is done I will extract data from these files and obviously I want "normal" characters like I see on the original web-page and not some special characters. 完成之后，我将从这些文件中提取数据，显然我想要的是“正常”字符，就像我在原始网页上看到的那样，而不是一些特殊字符。

Answer 1

The problem is that the WriteAllText Method don't write the encoded Text in UTF-8 in the File. 问题是WriteAllText方法不会在文件的UTF-8中写入编码的文本。 You should add the Encoding: 您应该添加编码：

System.IO.File.WriteAllText(fileName, pageHtml, Encoding.UTF8);

下载HTML页面并将其编码为文件

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-01-31 12:29:49

下载HTML页面并将其编码为文件

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-01-31 12:29:49

解决方案1
1 已采纳 2015-01-31 12:29:49