C# Get site source code with letters other than english

Question

I'm trying to get a site's source in C# using

WebClient client = new WebClient();
string content = client.DownloadString(url);

And it gets it just fine. However, the source code contains Hebrew characters which shows like Gibbrish in content variable. What do I need to do for it to recognize it?

Answer 1

WebClient client = new WebClient();
client.Encoding = System.Text.UTF8Encoding.UTF8; // added
string content = client.DownloadString(url);

You have to specify the encoding, you are probably requesting ASCII by default and the content could be in UTF8. This is an example where the encoding is set to UTF8. If you are not sure what it is check the source manually first and then specify the encoding accordingly. For more info see Remarks in the documentation.

Answer 2

The problem is the Encoding of your WebClient. MSDN says:

... the method uses the encoding specified in the Encoding property to convert the resource to a String.

Solution: Set a specific Encoding like

client.Encoding = Encoding.UTF8;

and try it again

string content = client.DownloadString(url);

UTF8 should do the trick to encode also the hebrew characters.

C# Get site source code with letters other than english

Question

2 answers

solution1
1 2016-08-15 19:18:51

solution2
0 ACCPTED 2016-08-15 19:20:39

C# Get site source code with letters other than english

Question

2 answers

solution1 1 2016-08-15 19:18:51

solution2 0 ACCPTED 2016-08-15 19:20:39

solution1
1 2016-08-15 19:18:51

solution2
0 ACCPTED 2016-08-15 19:20:39