C＃使用非英语字母获取站点源代码

Question

I'm trying to get a site's source in C# using 我正在尝试使用C＃获取网站的源代码

WebClient client = new WebClient();
string content = client.DownloadString(url);

And it gets it just fine. 它就可以了。 However, the source code contains Hebrew characters which shows like Gibbrish in content variable. 但是，源代码包含希伯来语字符，它们在内容变量中显示类似于Gibbrish。 What do I need to do for it to recognize it? 我需要做什么才能使其识别？

Answer 1

WebClient client = new WebClient();
client.Encoding = System.Text.UTF8Encoding.UTF8; // added
string content = client.DownloadString(url);

You have to specify the encoding, you are probably requesting ASCII by default and the content could be in UTF8. 您必须指定编码，默认情况下可能要求的是ASCII，内容可能为UTF8。 This is an example where the encoding is set to UTF8. 这是将编码设置为UTF8的示例。 If you are not sure what it is check the source manually first and then specify the encoding accordingly. 如果不确定是什么，请先手动检查源，然后相应地指定编码。 For more info see Remarks in the documentation. 有关更多信息，请参见文档中的备注。

Answer 2

The problem is the Encoding of your WebClient. 问题是您的WebClient的编码。 MSDN says: MSDN说：

... the method uses the encoding specified in the Encoding property to convert the resource to a String. ...该方法使用Encoding属性中指定的编码将资源转换为String。

Solution: Set a specific Encoding like 解决方案：设置特定的编码，例如

client.Encoding = Encoding.UTF8;

and try it again 然后再试一次

string content = client.DownloadString(url);

UTF8 should do the trick to encode also the hebrew characters. UTF8应该可以对希伯来字符进行编码。

C＃使用非英语字母获取站点源代码

问题描述

2 个解决方案

解决方案1
1 2016-08-15 19:18:51

解决方案2
0 已采纳 2016-08-15 19:20:39

C＃使用非英语字母获取站点源代码

问题描述

2 个解决方案

解决方案1 1 2016-08-15 19:18:51

解决方案2 0 已采纳 2016-08-15 19:20:39

解决方案1
1 2016-08-15 19:18:51

解决方案2
0 已采纳 2016-08-15 19:20:39