简体   繁体   中英

Windows WebBrowser problem with Ajax code in a page - c#

I'm downloading a site for its content using a Webcrawler I wrote with Microsoft WebBrowser.

A part of the site's content is sent only after some kind of verification sent from the client side - my guess is that its cookies / session cookies.

When i'm trying to download the page from my crawler i see (with Fiddler's help) that the inner link for the ajax sends 'false' for one of the parameters and the data is not received. When I try to perform the same action from any browser, Fiddler shows that the property is sent as '1'.

After a day of testing, any lead will be grateful - Is there a way to manipulate this property? plant cookies? any other idea?

Following khunj answer, I'm adding Headers from IE and from my WebBrowser:

In both headers i removed fields which have the same value

From IE:

GET /feed/prematch/1-1-234562-8527419630-1-2.dat HTTP/1.1
x-requested-with: XMLHttpRequest
Referer: http://www.mySite.com/ref=12345
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 8.0)
Connection: Keep-Alive
Cookie: __utma=1.1088924975.1299439925.1299976891.1300010848.14; 
__utmz=1.1299439925.1.1.utmcsr=(direct)|utmccn=
(direct)|__utmb=2.1.10.1300010848; __utmc=136771054; user_cookie=63814658; 
user_hash=58b923a5a234ecb78b7cc8806a0371c5; user_time=1297166428; infobox_8=1; 
user_login_id=12345;  mySite=5e1c0u8g6qh41o2798ua2bfbi3

HTTP/1.1 200 OK
Date: Sun, 13 Mar 2011 10:07:38 GMT
Server: Apache
Last-Modified: Sun, 13 Mar 2011 10:07:25 GMT
ETag: "26a6d9-19df-49e5a5c9ed140"
Accept-Ranges: bytes
Content-Length: 6623
Cache-Control: max-age=0, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Connection: close
Content-Type: text/plain
Content-Encoding: gzip

From WebBrowser:

GET /feed/prematch/1-1-234562-8527419630-false-2.dat HTTP/1.1
x-requested-with: XMLHttpRequest
Referer: http://www.mySite.com/ref=12345
Accept-Encoding: gzip, deflate
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0)
Connection: Keep-Alive
Cookie: __utma=1.1782626598.1299416994.1299974912.1300011023.129; 
__utmb=2.1.10.1300011023; __utmz=1.1299416994.1.1.utmcsr=
(direct)|utmccn=(direct)|__utmc=136771054; user_cookie=65192487; 
user_hash=6425034050442671103fdd614e4a2932; user_time=1299416986; 
user_full_time_zone=37;user_login_id=12345; mySite=q9qlqqm9bunm9siho32tdqdjo0


HTTP/1.1 404 Not Found
Date: Sun, 13 Mar 2011 10:10:33 GMT
Server: Apache
Content-Length: 313
Connection: close
Content-Type: text/html; charset=iso-8859-1

Thanks in advance,

Oz.

Well, the server is obviously treating your request from your crawler differently. Since you already have fiddler involved, what is different in your request headers when you make the request from IE versus using your crawler. The reason I say IE is because the webbrowser control uses the same engine as IE for doing its work.

The way I solved my problem is by using Fiddler as a proxy and defining a custom reply to the server that whenever the PathAndQuery property contains the site address, replace the 'false' to '1'. Not the most elegant solution but fits my problem.

I learned the most from these 2 pages:

FiddlerScript CookBook

A site which teaches on the specific customRules.js file and the field i needed to edit

Thanks for the help, Oz.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM