特殊字符不适用于使用页面编码的WebClient DownloadString

Question

EDIT : The characters come correctly, but in the middle of the page there's this line <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> , afterwards the special characters become é as é 编辑：字符正确，但在页面的中间有这行<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> ，之后特殊字符é成为é <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN" "http://www.w3.org/TR/REC-html40/strict.dtd"> (that are represented fine in browser), but are represented as eacute; （在浏览器中可以很好地表示），但可以表示为eacute; (without the & ) if downloaded via WebClient. （不带＆）（如果通过WebClient下载）。 END EDIT 结束编辑

I am extracting an excerpt from a web using WebClient + RegEx. 我正在使用WebClient + RegEx从网络中摘录。

But setting the encoding correctly still makes é as eacute; 但是正确设置编码仍然会使é变得eacute; , ç as ccedil; ， ç作为ccedil; , í as iacute; ， í如iacute; etc. 等等

I followed DownloadString and Special Characters example to correctly set the charset ( ISO-8859-1 ): 我按照DownloadString和Special Characters示例正确设置了字符集（ ISO-8859-1 ）：

System.Net.WebClient wc = new System.Net.WebClient();
wc.DownloadString("https://myurl"); //
var contentType = wc.ResponseHeaders["Content-Type"];
var charset = Regex.Match(contentType, "charset=([^;]+)").Groups[1].Value;
wc.Encoding = Encoding.GetEncoding(charset);

It does set charset like the document's ( ISO-8859-1 ), but when i do the follow-up DownloadString ( i know i could set the encoding before and just do one wc.DownloadString , but i wanted to follolw the accepted answer's example ): 它的确像文档（ ISO-8859-1 ）一样设置charset ，但是当我执行后续的DownloadString （ 我知道我可以设置编码，然后只做一个wc.DownloadString ，但是我想跟踪接受的答案的示例 ）：

string result = wc.DownloadString("https://myurl");

The special characters still come wrong. 特殊字符仍然出错。

NOTE : I am using a non-English Windows 10 (if it's relevant) 注意：我使用的是非英语的Windows 10（如果相关）

NOTE 2 : The page's special characters appear correctly in any browser 注意2 ：页面的特殊字符在任何浏览器中都能正确显示

My question is, why the WebClient don't download correctly even with the correct charset set? 我的问题是，为什么即使使用正确的字符集， WebClient也无法正确下载？

Answer 1

using System.Text; 使用System.Text;

wc.Encoding = Encoding.UTF8; wc.Encoding = Encoding.UTF8;

特殊字符不适用于使用页面编码的WebClient DownloadString

问题描述

1 个解决方案

解决方案1
-1 2019-04-24 20:58:27

特殊字符不适用于使用页面编码的WebClient DownloadString

问题描述

1 个解决方案

解决方案1 -1 2019-04-24 20:58:27

解决方案1
-1 2019-04-24 20:58:27