简体   繁体   English

使用子域爬网.net网站

[英]Crawling an .net site with subdomain

I am trying to crawl a .net site with php curl. 我正在尝试使用php curl爬行.net网站。 The site that i am trying to crawl is 我要抓取的网站是

http://waltham.patriotproperties.com

i am able to crawl the site. 我能够抓取该网站。

But when i am trying to crawl internal pages like 但是当我尝试抓取内部页面时

http://waltham.patriotproperties.com/about.asp

or any other page inside that sub-domain it gives me an error as follows 或该子域内的任何其他页面,它给我以下错误

The page cannot be displayed because an internal server error has occurred.1

The code that i am using is as bellow 我正在使用的代码如下

$ch = curl_init();
$urlLogin   =   "http://www.waltham.patriotproperties.com";
curl_setopt($ch, CURLOPT_URL, $urlLogin);
curl_setopt($ch, CURLOPT_COOKIEFILE, "cookie.txt");
curl_setopt($ch, CURLOPT_COOKIEJAR, "cookie.txt");
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);

//curl_setopt ($ch, CURLOPT_RETURNTRANSFER, 1); 

$data = curl_exec($ch);
echo $data; 

The code works for 该代码适用于

http://waltham.patriotproperties.com/

but does not work for 但不适用于

http://waltham.patriotproperties.com/search.asp
http://waltham.patriotproperties.com/summary.asp

ie any url within this sub domain. 即该子域内的任何网址。 The error that i get with url inside the sub-domain is 我在子域内使用url得到的错误是

HTTP/1.1 500 Internal Server Error
Content-Type: text/html
Server: Microsoft-IIS/7.5
Date: Wed, 05 Jun 2013 16:33:57 GMT
Content-Length: 75 

You're starting out at: 您将从以下位置开始:

$urlLogin   =   "http://www.waltham.patriotproperties.com";

But the link for the search page is at: 但是搜索页面的链接位于:

http://waltham.patriotproperties.com/search.asp

If you surf to that URL, you'll see the content; 如果您浏览该URL,您将看到内容。 if you add the www. 如果您添加www. to the start of the URL, it works. 到URL的开头,它可以正常工作。

Editted to add - this becomes a lot easier if they have an API you can use. 编辑添加 -如果他们有可以使用的API,这将变得容易得多

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM