I am trying to get HTML content from the amazon website. Here is my code to create request, response, and get string:
public static HttpWebResponse GetHttpWebResponse(string url)
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.ContentType = "text/xml";
try
{
return (HttpWebResponse)webRequest.GetResponse();
}
catch (WebException e)
{
if (e.Response == null)
throw new Exception("Cannot get response");
return (HttpWebResponse)e.Response;
}
}
public static string GetString(HttpWebResponse response)
{
Encoding encoding = Encoding.UTF8;
using (var reader = new StreamReader(response.GetResponseStream(), encoding))
{
string responseText = reader.ReadToEnd();
return responseText;
}
}
It is working fine with other web sites. However, when I try to get content from amazon, for example: https://www.amazon.com/gp/product/B00AEISSHA/ref=ppx_yo_dt_b_asin_title_o00_s00?ie=UTF8&psc=1 I am seeing encoded content:
I tried to change Encoding
and used HttpUtility.HtmlDecode(html);
but it couldn't help. Is there any simple way to get content from Amazon?
You're not catering for compression. If you update your webrequest like this, it should do the trick.
public static HttpWebResponse GetHttpWebResponse(string url)
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create(url);
webRequest.ContentType = "text/xml";
webRequest.AutomaticDecompression = DecompressionMethods.GZip;
try
{
return (HttpWebResponse)webRequest.GetResponse();
}
catch (WebException e)
{
if (e.Response == null)
throw new Exception("Cannot get response");
return (HttpWebResponse)e.Response;
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.