简体   繁体   English

通过C#禁止错误403模拟请求

[英]Error Forbidden 403 simulating request via C#

Scope: 范围:

I am developing a C# aplication to simulate queries into this site . 我正在开发C#应用程序,以模拟对该站点的查询。 I am quite familiar with simulating web requests for achieving the same human steps, but using code instead. 我对模拟Web请求以实现相同的人工步骤非常熟悉,但是改用了代码。

If you want to try yourself, just type this number into the CNPJ box: 08775724000119 and write the captcha and click on Confirmar 如果您想尝试一下,只需在CNPJ框中输入以下数字: 08775724000119并输入验证码,然后单击Confirmar

I've dealed with the captcha already, so it's not a problem anymore. 我已经处理过验证码了,所以不再是问题了。

Problem: 问题:

As soon as i execute the POST request for a "CNPJ", a exception is thrown: 一旦我对“ CNPJ”执行POST请求,就会引发异常:

The remote server returned an error: (403) Forbidden. 远程服务器返回错误:(403)禁止。

Fiddler Debugger Output: Fiddler调试器输出:

Link for Fiddler Download 提琴手的链接下载

This is the request generated by my browser, not by my code 这是我的浏览器而不是我的代码生成的请求

POST https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco HTTP/1.1
Host: www.sefaz.rr.gov.br
Connection: keep-alive
Content-Length: 208
Cache-Control: max-age=0
Origin: https://www.sefaz.rr.gov.br
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.11 (KHTML, like Gecko)    Chrome/23.0.1271.97 Safari/537.11
Content-Type: application/x-www-form-urlencoded
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Referer: https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco
Accept-Encoding: gzip,deflate,sdch
Accept-Language: pt-BR,pt;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: GX_SESSION_ID=gGUYxyut5XRAijm0Fx9ou7WnXbVGuUYoYTIKtnDydVM%3D;   JSESSIONID=OVuuMFCgQv9k2b3fGyHjSZ9a.undefined


//    PostData : 
_EventName=E%27CONFIRMAR%27.&_EventGridId=&_EventRowId=&_MSG=&_CONINSEST=&_CONINSESTG=08775724000119&cfield=rice&_VALIDATIONRESULT=1&BUTTON1=Confirmar&sCallerURL=http%3A%2F%2Fwww.sintegra.gov.br%2Fnew_bv.html

Code samples and References used: 使用的代码示例和参考:

I'm using a self developed library to handle/wrap the Post and Get requests. 我正在使用一个自行开发的库来处理/包装Post和Get请求。

The request object has the same parameters (Host,Origin, Referer, Cookies..) as the one issued by the browser (logged my fiddler up here). 请求对象具有与浏览器发布的参数相同的参数(Host,Origin,Referer,Cookies ..)(在此处记录了我的提琴手)。

I've also managed to set the ServicePointValidator of certificates by using: 我还设法使用以下方法设置了证书的ServicePointValidator

ServicePointManager.ServerCertificateValidationCallback = 
    new RemoteCertificateValidationCallback (delegate { return true; });

After all that configuration, i stil getting the forbidden exception. 完成所有这些配置后,我仍然得到禁止的异常。

Here is how i simulate the request and the exception is thrown 这是我模拟请求并引发异常的方式

        try
        {
            this.Referer = Consts.REFERER;

            // PARAMETERS: URL, POST DATA, ThrownException (bool)
            response = Post (Consts.QUERYURL, postData, true);
        }
        catch (Exception ex)
        {
            string s = ex.Message;
        }

Thanks in advance for any help / solution to my problem 预先感谢您对我的问题的任何帮助/解决方案

Update 1: 更新1:

I was missing the request for the homepage, which generates cookies (Thanks @W0lf for pointing me that out) 我错过了生成Cookie的首页的请求(感谢@ W0lf指出了这一点)

Now there's another weird thing. 现在还有另一件事。 Fiddler is not showing my Cookies on the request, but here they are : Fiddler并未在请求中显示我的Cookies,但它们是: 饼干罐

I made a successful request using the browser and recorded it in Fiddler. 我使用浏览器成功请求并记录在Fiddler中。

The only things that differ from your request are: 与您的请求唯一不同的是:

  • my browser sent no value for the sCallerURL parameter (I have sCallerURL= instead of sCallerURL=http%3A%2F%2Fwww.... ) 我的浏览器没有为sCallerURL参数发送任何值(我有sCallerURL=而不是sCallerURL=http%3A%2F%2Fwww....
  • the session ids are different (obviously) 会话ID不同(很明显)
  • I have other Accept-Language: values (I'm pretty sure this is not important) 我还有其他的Accept-Language:值(我很确定这并不重要)
  • the Content-Length is different (obviously) Content-Length不同(很明显)

Update 更新

OK, I thought the Fiddler trace was from your application. 好的,我认为Fiddler跟踪来自您的应用程序。 In case you are not setting cookies on your request, do this: 如果您未根据请求设置Cookie,请执行以下操作:

  • before posting data, do a GET request to https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco . 在发布数据之前,请对https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco进行GET请求。 If you examine the response, you'll notice the website sends two session cookies. 如果您检查响应,您会注意到该网站发送了两个会话cookie。
  • when you do the POST request, make sure to attach the cookies you got at the previous step 在执行POST请求时,请确保附加上一步中获得的cookie

If you don't know how to store the cookies and use them in the other request, take a look here . 如果您不知道如何存储Cookie并在其他请求中使用它们,请在此处查看

Update 2 更新2

The problems 问题所在

OK, I managed to reproduce the 403, figured out what caused it, and found a fix. 好的,我设法重现了403,找出是什么原因造成的,并找到了解决方法。

What happens in the POST request is that: POST请求中发生的是:

  • the server responds with status 302 (temporary redirect) and the redirect location 服务器响应状态302(临时重定向)和重定向位置
  • the browser redirects (basically does a GET request) to that location, also posting the two cookies. 浏览器将重定向(基本上是执行GET请求)到该位置,同时发布两个cookie。

.NET's HttpWebRequest attempts to do this redirect seamlessly, but in this case there are two issues (that I would consider bugs in the .NET implementation): .NET的HttpWebRequest尝试无缝执行此重定向,但是在这种情况下,存在两个问题(我将考虑.NET实现中的错误):

  1. the GET request after the POST(redirect) has the same content-type as the POST request ( application/x-www-form-urlencoded ). POST(重定向)之后的GET请求具有与POST请求相同的内容类型( application/x-www-form-urlencoded )。 For GET requests this shouldn't be specified 对于GET请求,不应指定

  2. cookie handling issue (the most important issue) - The website sends two cookies: GX_SESSION_ID and JSESSIONID . Cookie处理问题(最重要的问题)-网站发送了两个Cookie: GX_SESSION_IDJSESSIONID The second has a path specified ( /sintegra ), while the first does not. 第二个具有指定的路径( /sintegra ),而第一个没有。

Here's the difference: the browser assigns by default a path of / (root) to the first cookie, while .NET assigns it the request url path ( /sintegra/servlet/hwsintco ). 区别在于:浏览器默认为第一个cookie分配路径/ (根),而.NET为其分配请求url路径( /sintegra/servlet/hwsintco )。

Due to this, the last GET request (after redirect) to /sintegra/servlet/hwsintpe... does not get the first cookie passed in, as its path does not correspond. 因此,对/sintegra/servlet/hwsintpe...的最后一个GET请求(重定向后) /sintegra/servlet/hwsintpe...不会传递第一个cookie,因为它的路径不对应。

The fixes 修复

  • For the redirect problem (GET with content-type), the fix is to do the redirect manually, instead of relying on .NET for this. 对于重定向问题(使用内容类型的GET),解决方法是手动进行重定向,而不是依赖于.NET。

To do this, tell it to not follow redirects: 为此,请告诉它不要遵循重定向:

postRequest.AllowAutoRedirect = false

and then read the redirect location from the POST response and manually do a GET request on it. 然后从POST响应中读取重定向位置,然后手动对其执行GET请求。

For this, the fix I found was to take the misplaced cookie from the CookieContainer, set it's path correctly and add it back to the container in the correct location. 为此,我发现的解决方法是从CookieContainer中取出放错位置的cookie,正确设置其路径,然后将其重新添加到容器中的正确位置。

This is the code to do it: 这是执行此操作的代码:

private void FixMisplacedCookie(CookieContainer cookieContainer)
{
    var misplacedCookie = cookieContainer.GetCookies(new Uri(Url))[0];

    misplacedCookie.Path = "/"; // instead of "/sintegra/servlet/hwsintco"

    //place the cookie in thee right place...
    cookieContainer.SetCookies(
        new Uri("https://www.sefaz.rr.gov.br/"), 
        misplacedCookie.ToString());
}

Here's all the code to make it work: 这是使其工作的所有代码:

using System;
using System.IO;
using System.Net;
using System.Text;

namespace XYZ
{
    public class Crawler
    {

        const string Url = "https://www.sefaz.rr.gov.br/sintegra/servlet/hwsintco";

        public void Crawl()
        {
            var cookieContainer = new CookieContainer();

            /* initial GET Request */
            var getRequest = (HttpWebRequest)WebRequest.Create(Url);
            getRequest.CookieContainer = cookieContainer;
            ReadResponse(getRequest); // nothing to do with this, because captcha is f#@%ing dumb :)

            /* POST Request */
            var postRequest = (HttpWebRequest)WebRequest.Create(Url);

            postRequest.AllowAutoRedirect = false; // we'll do the redirect manually; .NET does it badly
            postRequest.CookieContainer = cookieContainer;
            postRequest.Method = "POST";
            postRequest.ContentType = "application/x-www-form-urlencoded";

            var postParameters =
                "_EventName=E%27CONFIRMAR%27.&_EventGridId=&_EventRowId=&_MSG=&_CONINSEST=&" +
                "_CONINSESTG=08775724000119&cfield=much&_VALIDATIONRESULT=1&BUTTON1=Confirmar&" +
                "sCallerURL=";

            var bytes = Encoding.UTF8.GetBytes(postParameters);

            postRequest.ContentLength = bytes.Length;

            using (var requestStream = postRequest.GetRequestStream())
                requestStream.Write(bytes, 0, bytes.Length);

            var webResponse = postRequest.GetResponse();

            ReadResponse(postRequest); // not interested in this either

            var redirectLocation = webResponse.Headers[HttpResponseHeader.Location];

            var finalGetRequest = (HttpWebRequest)WebRequest.Create(redirectLocation);


            /* Apply fix for the cookie */
            FixMisplacedCookie(cookieContainer);

            /* do the final request using the correct cookies. */
            finalGetRequest.CookieContainer = cookieContainer;

            var responseText = ReadResponse(finalGetRequest);

            Console.WriteLine(responseText); // Hooray!
        }

        private static string ReadResponse(HttpWebRequest getRequest)
        {
            using (var responseStream = getRequest.GetResponse().GetResponseStream())
            using (var sr = new StreamReader(responseStream, Encoding.UTF8))
            {
                return sr.ReadToEnd();
            }
        }

        private void FixMisplacedCookie(CookieContainer cookieContainer)
        {
            var misplacedCookie = cookieContainer.GetCookies(new Uri(Url))[0];

            misplacedCookie.Path = "/"; // instead of "/sintegra/servlet/hwsintco"

            //place the cookie in thee right place...
            cookieContainer.SetCookies(
                new Uri("https://www.sefaz.rr.gov.br/"),
                misplacedCookie.ToString());
        }
    }
}

有时HttpWebRequest需要代理初始化:request.Proxy = new WebProxy(); //在我的情况下,它不需要参数,但是您可以将其设置为代理地址

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM