[英]HttpClient - Different content returned than browser
I'm trying to make a request to kicksusa.com. 我正在尝试向kicksusa.com发出请求。 If I make the request from any browser, I get the full expected HTML, however, I cannot seem to simulate the request in a way that returns the same HTML, instead I get a 'Request unsuccessful.' 如果我从任何浏览器发出请求,都将获得完整的预期HTML,但是,我似乎无法以返回相同HTML的方式模拟该请求,而是得到了“请求失败”。 message. 信息。
Any help is appreciated 任何帮助表示赞赏
My code: 我的代码:
HttpClientHandler httpClientHandler = new HttpClientHandler()
{
//Proxy = proxy,
AllowAutoRedirect = true,
MaxAutomaticRedirections = 15,
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate | DecompressionMethods.None
};
var client = new HttpClient();
client.DefaultRequestHeaders.Add("Host", "www.kicksusa.com");
client.DefaultRequestHeaders.Add("Connection", "keep-alive");
client.DefaultRequestHeaders.Add("Upgrade-Insecure-Requests", "1");
client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36");
client.DefaultRequestHeaders.Add("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8");
client.DefaultRequestHeaders.Add("Accept-Encoding", "gzip, deflate, sdch");
client.DefaultRequestHeaders.Add("Accept-Language", "en-GB,en-US;q=0.8,en;q=0.6");
var _response = await client.GetAsync("http://www.kicksusa.com/jordan-craig/oil-stain-slub-tee-army-green-8909ag.html");
if (_response.IsSuccessStatusCode)
{
var _html = await _response.Content.ReadAsStringAsync();
}
Fiddler trace headers: 提琴手跟踪头:
Host: www.kicksusa.com
Connection: keep-alive
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.87 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding: gzip, deflate, sdch
Accept-Language: en-GB,en-US;q=0.8,en;q=0.6
This website uses some dedicated technology from Incapsula to prevent automated access to the website. 该网站使用Incapsula的某些专用技术来防止自动访问该网站。
On the first request, the site returns a web document with an embedded iframe. 根据第一个请求,该网站返回带有嵌入式iframe的网络文档。 Only when the iframe source is then loaded, a cookie is set and a redirect to the page happens. 仅当随后加载iframe源时,才会设置cookie并重定向到页面。 All further requests will then succeed immediately because the browser sends the cookie information. 然后,所有其他请求将立即成功,因为浏览器发送了cookie信息。
In order to circumvent the mechanism, you would have to load the iframe after the first request, remember the cookie and then send the cookie for all further requests. 为了规避该机制,您必须在第一个请求之后加载iframe,记住该cookie,然后为所有其他请求发送该cookie。 There's also a lot of JavaScript code involved in the first answer which would probably have to be executed for the Incapsula check to succeed. 为了使Incapsula检查成功,第一个答案中还涉及很多JavaScript代码。
However, when the site specifically uses such a technology to prevent automatic access to its content, any attempt to circumvent this mechanism, must be considered undesired and as a criminal act. 但是,当站点专门使用这种技术来防止自动访问其内容时,任何企图规避此机制的尝试都必须视为不希望的,并且是犯罪行为。 You should not try to automatically gather data from a site without its owner's approval, specifically not when such a technology as Incapusla is used to make this more difficult. 未经所有者的同意,您不应该尝试自动从站点收集数据,尤其是当使用Incapusla这样的技术使站点变得更困难时,尤其如此。
See also this answer by an Incapsula employee for more details. 有关更多详细信息,另请参见Incapsula员工的此答案 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.