简体   繁体   中英

CURL fails after many runs saying “could not establish connection” or “connect() timed out”

I'm trying to index many hundrets of web-pages.

In Short

  1. Calling a PHP script using a CRON-job
  2. Getting some (only around 15) of the least recently updated URLs
  3. Querying theses URLs using CURL

The Problem

In development everything went fine. But when I started to index much more then some testpages, CURL refused to work after some runs. It does not get any data from the remote server.

Error messages

These errors CURL has printed out (of course not at once)

  1. couldn't connect to host
  2. Operation timed out after 60000 milliseconds with 0 bytes received

I'm working on a V-Server and tried to connect to the remote server using Firefox or wget. Also nothing. But when connecting to that remote server from my local machine everything works fine.

Waiting some hours, it again works for some runs.

For me it seems like a problem on the remote server or a DDOS-protection or something like that, what do you guys think?

How often is the script run? It really could be triggering some DOS-like protection. I would recommend implementing some random delay to make the requests seem delayed by some time to make them appear more "natural"

You should be using proxies when you send out too many requests as your IP can be blocked by the site by their DDOS protection or similar setups.

Here are somethings to note : (What I used for scraping datas of websites)

1.Use Proxies.

2.Use Random User Agents

3.Random Referers

4.Random Delay in crons.

5.Random Delay between requets.

What I would do is make the script run for ever and add sleep in between.

ignore_user_abort(1);
set_time_limit(0);

Just trigger it with visiting the url for a sec and it will run forever.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM