Scope:
I am developing a C# aplication to simulate queries into this site . I am quite familiar with simulating web requests for achieving the same human steps, but using code instead.
If you want to try yourself, just type this number into the CNPJ box: 08775724000119
and write the captcha and click on Confirmar
I've dealed with the captcha already, so it's not a problem anymore.
Problem:
As soon as i execute the POST request for a "CNPJ", a exception is thrown:
The remote server returned an error: (403) Forbidden.
Fiddler Debugger Output:
This is the request generated by my browser, not by my code
POST https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco HTTP/1.1
Host: www.sefaz.rr.gov.br
Connection: keep-alive
Content-Length: 208
Cache-Control: max-age=0
Origin: https://www.sefaz.rr.gov.br
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Referer: https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco
Accept-Encoding: gzip,deflate,sdch
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: GX_SESSION_ID=gGUYxyut5XRAijm0Fx9ou7WnXbVGuUYoYTIKtnDydVM%3D; JSESSIONID=OVuuMFCgQv9k2b3fGyHjSZ9a.undefined
// PostData :
_EventName=E%27CONFIRMAR%27.&_EventGridId=&_EventRowId=&_MSG=&_CONINSEST=&_CONINSESTG=08775724000119&cfield=rice&_VALIDATIONRESULT=1&BUTTON1=Confirmar&sCallerURL=http%3A%2F%2Fwww.sintegra.gov.br%2Fnew_bv.html
Code samples and References used:
I'm using a self developed library to handle/wrap the Post and Get requests.
The request object has the same parameters (Host,Origin, Referer, Cookies..) as the one issued by the browser (logged my fiddler up here).
I've also managed to set the ServicePointValidator
of certificates by using:
ServicePointManager.ServerCertificateValidationCallback =
new RemoteCertificateValidationCallback (delegate { return true; });
After all that configuration, i stil getting the forbidden exception.
Here is how i simulate the request and the exception is thrown
try
{
this.Referer = Consts.REFERER;
// PARAMETERS: URL, POST DATA, ThrownException (bool)
response = Post (Consts.QUERYURL, postData, true);
}
catch (Exception ex)
{
string s = ex.Message;
}
Thanks in advance for any help / solution to my problem
Update 1:
I was missing the request for the homepage, which generates cookies (Thanks @W0lf for pointing me that out)
Now there's another weird thing. Fiddler is not showing my Cookies on the request, but here they are :
I made a successful request using the browser and recorded it in Fiddler.
The only things that differ from your request are:
sCallerURL
parameter (I have sCallerURL=
instead of sCallerURL=http%3A%2F%2Fwww....
) Accept-Language:
values (I'm pretty sure this is not important) Content-Length
is different (obviously) OK, I thought the Fiddler trace was from your application. In case you are not setting cookies on your request, do this:
https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco
. If you examine the response, you'll notice the website sends two session cookies. If you don't know how to store the cookies and use them in the other request, take a look here .
OK, I managed to reproduce the 403, figured out what caused it, and found a fix.
What happens in the POST request is that:
.NET's HttpWebRequest attempts to do this redirect seamlessly, but in this case there are two issues (that I would consider bugs in the .NET implementation):
the GET request after the POST(redirect) has the same content-type as the POST request ( application/x-www-form-urlencoded
). For GET requests this shouldn't be specified
cookie handling issue (the most important issue) - The website sends two cookies: GX_SESSION_ID
and JSESSIONID
. The second has a path specified ( /sintegra
), while the first does not.
Here's the difference: the browser assigns by default a path of /
(root) to the first cookie, while .NET assigns it the request url path ( /sintegra/servlet/hwsintco
).
Due to this, the last GET request (after redirect) to /sintegra/servlet/hwsintpe...
does not get the first cookie passed in, as its path does not correspond.
To do this, tell it to not follow redirects:
postRequest.AllowAutoRedirect = false
and then read the redirect location from the POST response and manually do a GET request on it.
For this, the fix I found was to take the misplaced cookie from the CookieContainer, set it's path correctly and add it back to the container in the correct location.
This is the code to do it:
private void FixMisplacedCookie(CookieContainer cookieContainer)
{
var misplacedCookie = cookieContainer.GetCookies(new Uri(Url))[0];
misplacedCookie.Path = "/"; // instead of "/sintegra/servlet/hwsintco"
//place the cookie in thee right place...
cookieContainer.SetCookies(
new Uri("https://www.sefaz.rr.gov.br/"),
misplacedCookie.ToString());
}
using System;
using System.IO;
using System.Net;
using System.Text;
namespace XYZ
{
public class Crawler
{
const string Url = "https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco";
public void Crawl()
{
var cookieContainer = new CookieContainer();
/* initial GET Request */
var getRequest = (HttpWebRequest)WebRequest.Create(Url);
getRequest.CookieContainer = cookieContainer;
ReadResponse(getRequest); // nothing to do with this, because captcha is f#@%ing dumb :)
/* POST Request */
var postRequest = (HttpWebRequest)WebRequest.Create(Url);
postRequest.AllowAutoRedirect = false; // we'll do the redirect manually; .NET does it badly
postRequest.CookieContainer = cookieContainer;
postRequest.Method = "POST";
postRequest.ContentType = "application/x-www-form-urlencoded";
var postParameters =
"_EventName=E%27CONFIRMAR%27.&_EventGridId=&_EventRowId=&_MSG=&_CONINSEST=&" +
"_CONINSESTG=08775724000119&cfield=much&_VALIDATIONRESULT=1&BUTTON1=Confirmar&" +
"sCallerURL=";
var bytes = Encoding.UTF8.GetBytes(postParameters);
postRequest.ContentLength = bytes.Length;
using (var requestStream = postRequest.GetRequestStream())
requestStream.Write(bytes, 0, bytes.Length);
var webResponse = postRequest.GetResponse();
ReadResponse(postRequest); // not interested in this either
var redirectLocation = webResponse.Headers[HttpResponseHeader.Location];
var finalGetRequest = (HttpWebRequest)WebRequest.Create(redirectLocation);
/* Apply fix for the cookie */
FixMisplacedCookie(cookieContainer);
/* do the final request using the correct cookies. */
finalGetRequest.CookieContainer = cookieContainer;
var responseText = ReadResponse(finalGetRequest);
Console.WriteLine(responseText); // Hooray!
}
private static string ReadResponse(HttpWebRequest getRequest)
{
using (var responseStream = getRequest.GetResponse().GetResponseStream())
using (var sr = new StreamReader(responseStream, Encoding.UTF8))
{
return sr.ReadToEnd();
}
}
private void FixMisplacedCookie(CookieContainer cookieContainer)
{
var misplacedCookie = cookieContainer.GetCookies(new Uri(Url))[0];
misplacedCookie.Path = "/"; // instead of "/sintegra/servlet/hwsintco"
//place the cookie in thee right place...
cookieContainer.SetCookies(
new Uri("https://www.sefaz.rr.gov.br/"),
misplacedCookie.ToString());
}
}
}
有时HttpWebRequest需要代理初始化:request.Proxy = new WebProxy(); //在我的情况下,它不需要参数,但是您可以将其设置为代理地址
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.