CURL fails after many runs saying “could not establish connection” or “connect() timed out”

Question

I'm trying to index many hundrets of web-pages.

In Short

Calling a PHP script using a CRON-job
Getting some (only around 15) of the least recently updated URLs
Querying theses URLs using CURL

The Problem

In development everything went fine. But when I started to index much more then some testpages, CURL refused to work after some runs. It does not get any data from the remote server.

Error messages

These errors CURL has printed out (of course not at once)

couldn't connect to host
Operation timed out after 60000 milliseconds with 0 bytes received

I'm working on a V-Server and tried to connect to the remote server using Firefox or wget. Also nothing. But when connecting to that remote server from my local machine everything works fine.

Waiting some hours, it again works for some runs.

For me it seems like a problem on the remote server or a DDOS-protection or something like that, what do you guys think?

Answer 1

How often is the script run? It really could be triggering some DOS-like protection. I would recommend implementing some random delay to make the requests seem delayed by some time to make them appear more "natural"

Answer 2

You should be using proxies when you send out too many requests as your IP can be blocked by the site by their DDOS protection or similar setups.

Here are somethings to note : (What I used for scraping datas of websites)

1.Use Proxies.

2.Use Random User Agents

3.Random Referers

4.Random Delay in crons.

5.Random Delay between requets.

What I would do is make the script run for ever and add sleep in between.

ignore_user_abort(1);
set_time_limit(0);

Just trigger it with visiting the url for a sec and it will run forever.

CURL fails after many runs saying “could not establish connection” or “connect() timed out”

Question

In Short

The Problem

Error messages

2 answers

solution1
1 2012-05-08 11:35:48

solution2
1 ACCPTED 2012-05-09 07:18:41

CURL fails after many runs saying “could not establish connection” or “connect() timed out”

Question

In Short

The Problem

Error messages

2 answers

solution1 1 2012-05-08 11:35:48

solution2 1 ACCPTED 2012-05-09 07:18:41

solution1
1 2012-05-08 11:35:48

solution2
1 ACCPTED 2012-05-09 07:18:41