[英]Get HTML from URL - StreamReader uses another character encoding?
我想從這個 URL 獲取 HTML: https : //store.steampowered.com/app/513710/SCUM/
這應該很容易,但由於 SSL/TLS 錯誤我無法做到。
所以我使用了這個問題的代碼: Requesting html over https with c# Webclient
最后我可以填充我的 StreamReader,但是當我嘗試使用帶有字符串的 ReadToEnd() 時,我得到一個損壞的字符串,類似於這樣:“ ”
這一定是關於字符編碼的,但是如果你打開: https : //store.steampowered.com/app/513710/SCUM/
然后打開你的瀏覽器控制台,可以看到開頭:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
在提供的代碼中:
webClient.Headers["Accept-Charset"] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
你有 utf-8,所以我不知道為什么我會遇到這個問題。 我試圖替換:
StreamReader(webClient.OpenRead(steamURL));
和:
StreamReader(webClient.OpenRead(steamURL), Encoding.UTF8, true);
但它仍然沒有得到正確的文本。 我嘗試添加所有我可以添加的信息,如果您需要其他任何信息,我會編輯問題。
謝謝你的時間,祝你有美好的一天。
問候,
大衛
PS:這是我現在的代碼:
private StreamReader getStreamReader(string steamURL, WebClient webClient)
{
return new StreamReader(webClient.OpenRead(steamURL), Encoding.UTF8, true);
}
private void getSteamCosts()
{
// When I try to access an Steam HTML, SSL error appears
// We need an specific security protocol
// I check all, just in case
ServicePointManager.ServerCertificateValidationCallback =
new RemoteCertificateValidationCallback(
delegate
{
return true;
});
using (WebClient webClient = new WebClient())
{
webClient.Headers["User-Agent"] = "Mozilla/5.0 (Windows;"
+ " U; Windows NT 6.0; en-US; rv:1.9.2.6) Gecko/20100625"
+ " Firefox/3.6.6 (.NET CLR 3.5.30729)";
webClient.Headers["Accept"] = "text/html,application/xhtml+"
+ "xml,application/xml;q=0.9,*/*;q=0.8";
webClient.Headers["Accept-Language"] = "en-us,en;q=0.5";
webClient.Headers["Accept-Encoding"] = "gzip,deflate";
webClient.Headers["Accept-Charset"] = "ISO-8859-1,utf-8;q=0.7,*;q=0.7";
StreamReader sr = null;
string steamURL = "https://store.steampowered.com/app/513710/SCUM/";
try
{
// This one should work
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls12;
sr = getStreamReader(steamURL, webClient);
lbFinalSteam.Text = "TLS12Final";
}
catch (Exception) // Bad coding practice, just wanted it to work
{
// If that's not the case, I try the rest
try
{
ServicePointManager.SecurityProtocol = SecurityProtocolType.Tls;
sr = getStreamReader(steamURL, webClient);
lbFinalSteam.Text = "TLSFinal";
}
catch (Exception)
{
try
{
ServicePointManager.SecurityProtocol = SecurityProtocolType.Ssl3;
sr = getStreamReader(steamURL, webClient);
lbFinalSteam.Text = "SSL3Final";
}
catch (Exception)
{
try
{
ServicePointManager.SecurityProtocol =
SecurityProtocolType.Tls11;
sr = getStreamReader(steamURL, webClient);
lbFinalSteam.Text = "TLS11Final";
}
catch (Exception)
{
lbFinalSteam.Text = "NoFinal";
}
}
}
}
if (sr != null)
{
string allLines = sr.ReadToEnd();
}
}
}
編輯:也許問題是我如何將 StreamReader 轉換為字符串? 我的意思是這一行:
string allLines = sr.ReadToEnd();
我應該使用其他東西嗎?
正如https://stackoverflow.com/users/246342/alex-k已經寫的那樣,問題不在於編碼,而是我得到了一個壓縮的 Gzimp。 我剛剛刪除了這個:
webClient.Headers["Accept-Encoding"] = "gzip,deflate";
它有效! 謝謝亞歷克斯K! :D
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.