简体   繁体   中英

Fiddler Website web request Mimicking for screen scraping

I'm trying to log into a website to download data through my account. This is the raw Fiddler Request for the POST login form.

POST login/login.jsp HTTP/1.1
Host: server.com
Connection: keep-alive
Content-Length: 73
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Origin: https://server.com
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://server.com/login/login.jsp
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Cookie: __utma=109610308.114257620.1370889472.1373479499.1371761934.3; __utmc=109613338; __utmz=109610308.1373249472.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=(not%20provided); _bcvm_vid_424161365915852877=4393500994580715020; _bcvm_vrid_424161365915852877=4393492275825713189; WT_FPC=id=199.234.233.42-2645888112.30303753:lv=1371356395815:ss=1371758333825; JSESSIONID=RGJGy4yQ2WCXRPbnhxCTKGb2rZh39b67d8g8PktTQLqfsBQTlTlYLTD!1154156211; BIGipServeresuite_prod_pool=295635768.2713643.0000

It then responds with:

HTTP/1.1 302 Moved Temporarily
Date: Fri, 21 Jun 2013 12:39:46 GMT
Location: https://server.com/login/redirect.jsp?APPLICATION=0
Content-Type: text/html
Set-Cookie: SECURITY_SESSION_ID=383826514*198399234219875960; domain=.server.com; path=/
Connection: Close
Set-Cookie: BIGipServeresuite_prod_pool=294168768.27163.0000; expires=Fri, 21-Jun-2013 13:09:47 GMT; path=/
Content-Length: 3669

That SECURITY_SESSION_ID is what's needed to do anything on the site.

To mimic it I coded wrote this:

   //GET the Login page - I preform a quick get to pick up the first two important cookies

          HttpWebRequest GETLoginRequest = (HttpWebRequest)HttpWebRequest.Create("https://server.com/login/login.jsp");
          GETLoginRequest.Method = "GET";
          GETLoginRequest.Accept = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
          GETLoginRequest.AllowAutoRedirect = false;
          GETLoginRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)";
          GETLoginRequest.CookieContainer = cookieJar;  

          HttpWebResponse GETLoginResponse = (HttpWebResponse)GETLoginRequest.GetResponse(); //Gets the JSession and BIGipServer cookies
          Console.Write(" \n 3rd count after GETLoginResponse : " + cookieJar.Count + "\n");

 //POST Login

          HttpWebRequest POSTLoginRequest = (HttpWebRequest)HttpWebRequest.Create("https://server.com/login/login.jsp");
          POSTLoginRequest.Method = "POST";
          WebHeaderCollection myWebHeaderCollection = POSTLoginRequest.Headers;
          POSTLoginRequest.AllowAutoRedirect = true;
          byte[] bytes = Encoding.ASCII.GetBytes(formParams);

       ///Cache
          POSTLoginRequest.Headers.Add(HttpRequestHeader.CacheControl, "max-age=0"); 

        //Client
          POSTLoginRequest.Accept = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
          POSTLoginRequest.Headers.Add(HttpRequestHeader.AcceptEncoding, "gzip,deflate,sdch"); 
          myWebHeaderCollection.Add("Accept-Language:en-US");
          POSTLoginRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)";
        //Cookies/Login
          POSTLoginRequest.CookieContainer = cookieJar; //The cookie jar contains mainly added
        //Entity
          POSTLoginRequest.ContentLength = bytes.Length;
          POSTLoginRequest.ContentType = "Content-Type: application/x-www-form-urlencoded";
        //Miscellanous
          POSTLoginRequest.Headers.Add("Origin: https://server.com");
          POSTLoginRequest.Referer = "https://server.com/login/login.jsp";
        //Transport

              //Fix I found to allow Connection: Keep-Alive
          var sp = POSTLoginRequest.ServicePoint;
          var prop = sp.GetType().GetProperty("HttpBehaviour", BindingFlags.Instance | BindingFlags.NonPublic);
          prop.SetValue(sp, (byte)0, null);

          ServicePointManager.Expect100Continue = false;

          POSTLoginRequest.Host = "server.com";

          using (Stream os = POSTLoginRequest.GetRequestStream())
          {
              os.Write(bytes, 0, bytes.Length);
          }
          HttpWebResponse POSTLoginResponse = (HttpWebResponse)POSTLoginRequest.GetResponse();
          Console.Write(" \n 4th count after POSTLoginResponse : " + cookieJar.Count + "\n");

at the end of the day my request in fiddler looks like this:

 POST /login/login.jsp HTTP/1.1
Cache-Control: max-age=0
Accept: application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)
Content-Type: Content-Type: application/x-www-form-urlencoded
Origin: https://server.com
Referer: https://server.com/login/login.jsp
Host: esuite.pjm.com
Cookie: __utma=1096103034.235016339.1371048460.1371048460.1371048460.1; __utmz=109610428.1371048460.1.1.utmcsr=bing|utmccn=(organic)|utmcmd=organic|utmctr=pjm; _bcvm_vrid_424161365915852877=4393493862784729423; WT_FPC=id=199.234.233.42-3603288592.30304123:lv=1371044861062:ss=1371044859892; JSESSIONID=RGbYQd7JnPdNkTvtGCzQ9NLyFgfBnnyLFzbvKPg2Y0gLnhL2hp8F!-1770592471; BIGipServeresuite_prod_pool=327723200.27163.0000
Content-Length: 73
Connection: Keep-Alive

which is pretty much the same and yet I get this as a response:

HTTP/1.1 200 OK
Date: Fri, 21 Jun 2013 13:58:17 GMT
Content-Length: 3356
Content-Type: text/html
Set-Cookie: BIGipServeresuite_prod_pool=327723200.27163.0000; expires=Fri, 21-Jun-2013 14:28:17 GMT; path=/

I think one thing has to do with that in the browser the reply is a 302 Moved Temporarily but I really don't know. The two important cookies seem to be JSession and BIGIp because those are set by the site. The other cookies I manually added in and they seem to be google analytic cookies and I don't think they matter all to much. Anyway the headers are nearly identicaly but it's still not responding with the SECURITY_SESSION_ID that I'm looking for. Does anyone have any idea on what I'm doing wrong?

Your 'accept' headers are different, and your Content-Type part of the submit is wrong:

POSTLoginRequest.ContentType = "Content-Type: application/x-www-form-urlencoded";

Should be

POSTLoginRequest.ContentType = "application/x-www-form-urlencoded";

Make sure the post parameters are the same as in fiddler example:

So do this first before converting them to bytes.

HttpUtility.Urlencode(forumparameters)

See if that works.

As you are missing counts in ContentLength too, so must be cause of this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM