简体   繁体   中英

Scraping htmlagilitypack

I am using HtmlAgilityPack to perform Scraping in C # Asp.Net, so far I have not had problems when doing Scratch from several web, however, trying to eject the following code I get an error

Var getHtmlWeb = new HtmlWeb ();
Var home page = getHtmlWeb.Load ("https://www.corfo.cl/sites/cpp/home");

The error that appears is:

"Connection terminated: Unexpected sending error."

The only web that is giving me problems is Corfo and not how to solve this. I appreciate your help

This site relies on cookie to work, eg one of the URL it requested is https://www.corfo.cl/sites/Satellite;jsessionid=T8w78ZolfWgr3ZoEBBvE81nBiXbXIdjfF1In3bgpZiYvL_w8TF4p!1081543155!-596930586?c=Page&cid=1456408322328&pagename=CorfoPortalPublico/Page/corfoListadoOfertaInteligenteWebLayout

So, when you request www.corfo.cl, first it forward to www.corfo.cl/sites/cpp/home, then on /sites/ folder, it set cookie jsessionid=OHS_1~T8w78ZolfWgr3ZoEBBvE81nBiXbXIdjfF1In3bgpZiYvL_w8TF4p!1081543155!-596930586 etc.

With this cookie, this page build itself with all/some components related with this jsessionid.

If client code doesn't handle these logic, as above two lines, the server reset the connection as expected, because server doesn't know how to build this page without jsessionid.

The inner exception from System.Net.WebException is {"Authentication failed because the remote party has closed the transport stream."}

Hope this helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM