简体   繁体   English

Web浏览器和HttpWebRequest之间的POST不一致

[英]Inconsistent POSTing between Web Browser and HttpWebRequest

I'm working on Web Scraping using C# HttpWebRequest/HttpWebResponse. 我正在使用C#HttpWebRequest / HttpWebResponse进行Web爬网。 For the most part this process has gone smoothly. 在大多数情况下,此过程进展顺利。 But after POSTing my way through several pages, I have gotten stuck with what seems to be an inconsistency between testing with the Web Browser and the HttpWebRequest/HttpWebResponse calls. 但是在将自己的代码发布到多个页面之后,我陷入了Web浏览器测试与HttpWebRequest / HttpWebResponse调用之间似乎不一致的问题。

The problem occurs when I land on a page containing an input element that has a name similar to this: “RidiculouslyLongInputName.RidiculouslyLongInputName.RidiculouslyLongInputName.@RidiculouslyLong” 当我登陆包含名称与此类似的输入元素的页面时,就会出现问题:“ RidiculouslyLongInputName.RidiculouslyLongInputName.RidiculouslyLongInputName。@ RidiculouslyLong”

POSTing a value for this input element causes a 500 error when using HttpWebRequest but works fine when POSTing through the browser. 在使用HttpWebRequest时为此输入元素发布值会导致500错误,但在通过浏览器进行发布时可以正常工作。 If I remove this input value from the post data the the HttpWebRequest will not get the 500 error. 如果我从发布数据中删除此输入值,则HttpWebRequest不会收到500错误。 But then I'm stuck with a data validate issue from the website. 但是后来我陷入了网站上的数据验证问题。

Any idea on why HttpWebRequest is failing? 关于HttpWebRequest为什么失败的任何想法?

It's times like these when packet sniffers come in extremely useful for seeing exactly what kind of data is flowing through and what the difference is. 在这种情况下,数据包嗅探器对于准确查看正在传输的数据类型和区别是非常有用的。

http://www.wireshark.org/ http://www.wireshark.org/

Is a great tool for things like this. 是处理此类问题的好工具。

Filter down to only the domains you're interested in, then send off the packet with HttpWebRequest. 仅过滤到您感兴趣的域,然后使用HttpWebRequest发送数据包。 Save the packet data somewhere. 将数据包数据保存在某处。 Repeat but do the request through the browser. 重复但通过浏览器执行请求。 Check the difference. 检查差异。

If it is indeed an issue with POST variables, it should be evident in the HTTP payload. 如果确实存在POST变量问题,则在HTTP有效负载中应该很明显。

Not sure why you are running into the problem, but I would recommend grabbing a copy of Fiddler and taking a look at what the browser is sending in the POST request. 不知道为什么会遇到这个问题,但是我建议您拿一份Fiddler的副本,看看浏览器在POST请求中发送的内容。 It is possible there is something less than obvious going on. 有可能发生的事情比显而易见的要少。

You can also use Firebug extension with Firefox. 您也可以在Firefox中使用Firebug扩展。 With this extension installed and enabled, go through the entire scenario in Firefox. 安装并启用此扩展程序后,请遍历Firefox中的整个场景。 FIrebug will tell you the exact request/response sent by the browser. FIrebug会告诉您浏览器发送的确切请求/响应。 You can then duplicate that as much as possible using HttpWebRequest 然后,您可以使用HttpWebRequest尽可能多地复制它

First thanks for MEF response. 首先感谢MEF回复。 That case was a personal mistake so I deleted the question. 该案是个人失误,因此我删除了问题。

I think best tool for your case is Fiddler but I guess there are other JavaScript attached to that button or something like that you are missing to mimic. 我认为最适合您的情况的工具是Fiddler,但我想该按钮上还附加有其他JavaScript,或者您缺少模仿之类的东西。 WebRequest cannot do that for you and WebBrowser can do since it's working on DOM . WebRequest不能为您做到这一点,而WebBrowser可以做到,因为它在DOM上工作。

In order to use WebRequest correctly you highly need to reverse engineer every request by something like Fiddler . 为了正确使用WebRequest ,您非常需要通过Fiddler类的东西对每个请求进行反向工程。 It's very hard to find what's exactly going on by looking at the page's source (and it's referenced Javascripts/CSS...). 通过查看页面的源代码(并引用了Javascripts / CSS ...)很难找到正在发生的事情。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM