简体   繁体   中英

C# Get site source code with letters other than english

I'm trying to get a site's source in C# using

WebClient client = new WebClient();
string content = client.DownloadString(url);

And it gets it just fine. However, the source code contains Hebrew characters which shows like Gibbrish in content variable. What do I need to do for it to recognize it?

WebClient client = new WebClient();
client.Encoding = System.Text.UTF8Encoding.UTF8; // added
string content = client.DownloadString(url);

You have to specify the encoding, you are probably requesting ASCII by default and the content could be in UTF8. This is an example where the encoding is set to UTF8. If you are not sure what it is check the source manually first and then specify the encoding accordingly. For more info see Remarks in the documentation.

The problem is the Encoding of your WebClient. MSDN says:

... the method uses the encoding specified in the Encoding property to convert the resource to a String.

Solution: Set a specific Encoding like

client.Encoding = Encoding.UTF8;

and try it again

string content = client.DownloadString(url);

UTF8 should do the trick to encode also the hebrew characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM