[英]C# Get site source code with letters other than english
I'm trying to get a site's source in C# using 我正在尝试使用C#获取网站的源代码
WebClient client = new WebClient();
string content = client.DownloadString(url);
And it gets it just fine. 它就可以了。 However, the source code contains Hebrew characters which shows like Gibbrish in content variable. 但是,源代码包含希伯来语字符,它们在内容变量中显示类似于Gibbrish。 What do I need to do for it to recognize it? 我需要做什么才能使其识别?
WebClient client = new WebClient();
client.Encoding = System.Text.UTF8Encoding.UTF8; // added
string content = client.DownloadString(url);
You have to specify the encoding, you are probably requesting ASCII by default and the content could be in UTF8. 您必须指定编码,默认情况下可能要求的是ASCII,内容可能为UTF8。 This is an example where the encoding is set to UTF8. 这是将编码设置为UTF8的示例。 If you are not sure what it is check the source manually first and then specify the encoding accordingly. 如果不确定是什么,请先手动检查源,然后相应地指定编码。 For more info see Remarks in the documentation. 有关更多信息,请参见文档中的备注 。
The problem is the Encoding of your WebClient. 问题是您的WebClient的编码。 MSDN says: MSDN说:
... the method uses the encoding specified in the Encoding property to convert the resource to a String. ...该方法使用Encoding属性中指定的编码将资源转换为String。
Solution: Set a specific Encoding like 解决方案:设置特定的编码,例如
client.Encoding = Encoding.UTF8;
and try it again 然后再试一次
string content = client.DownloadString(url);
UTF8 should do the trick to encode also the hebrew characters. UTF8应该可以对希伯来字符进行编码。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.