I am trying to crawl a .net site with php curl. The site that i am trying to crawl is
http://waltham.patriotproperties.com
i am able to crawl the site.
But when i am trying to crawl internal pages like
http://waltham.patriotproperties.com/about.asp
or any other page inside that sub-domain it gives me an error as follows
The page cannot be displayed because an internal server error has occurred.1
The code that i am using is as bellow
$ch = curl_init();
$urlLogin = "http://www.waltham.patriotproperties.com";
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
//curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1);
$data = curl_exec($ch);
echo $data;
The code works for
http://waltham.patriotproperties.com/
but does not work for
http://waltham.patriotproperties.com/search.asp
http://waltham.patriotproperties.com/summary.asp
ie any url within this sub domain. The error that i get with url inside the sub-domain is
HTTP/1.1 500 Internal Server Error
Content-Type: text/html
Server: Microsoft-IIS/7.5
Date: Wed, 05 Jun 2013 16:33:57 GMT
Content-Length: 75
You're starting out at:
$urlLogin = "http://www.waltham.patriotproperties.com";
But the link for the search page is at:
http://waltham.patriotproperties.com/search.asp
If you surf to that URL, you'll see the content; if you add the www.
to the start of the URL, it works.
Editted to add - this becomes a lot easier if they have an API you can use.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.