简体   繁体   中英

Warning: file_get_contents(): php_network_getaddresses: getaddrinfo

I own a DEDICATED server and installed CENTOS 7 with installed WEB Panel and PHP Version 8.1.13

Everything was working fine from last 12 days until I tried to created a simple web crawl script to run.

<?php
include ("simple_html_dom.php");
$html =file_get_html("https://www.bbc.com");
echo $html;

foreach($html-> find("div li") as $h)
{
    echo $h-> text();
}

?>

It gave me an error

Warning: file_get_contents(): php_network_getaddresses: getaddrinfo for www.bbc.com failed: Name or service not known in home/myserver/public_html/news/simple_html_dom.php on line 82

Warning: file_get_contents(https://www.bbc.com): Failed to open stream: php_network_getaddresses: getaddrinfo for www.bbc.com failed: Name or service not known in /home/myserver/public_html/news/simple_html_dom.php on line 82

Fatal error: Uncaught Error: Call to a member function find() on bool in home/myserver/public_html/news/tim.php:6 Stack trace: #0 {main} thrown in /home/myserver/public_html/news/tim.php on line 6

I also looked for the logs on the server. It shows for many other domains too

在此处输入图像描述

To find the solution, I looked upon many videos tutorial and read several posts. What I know that this problem is due to mis configuration on DNS server side. I looked upon solution but none of the them worked me. I would be very thankful if someone please guide me to fix this DNS error.

I have already checked the basic configuration on server side and it looked very fine to me. Moreover I am very cautious to break the existing configuration.

It seems like your IP address was blocked or temporary restricted. Crawl process way complex than just getting data by file_get_contents. You should take care about headers, cookies, sessions maybe. Just to be like a normal user.

Also, you can use public API. Usually big sites like BBC have it to avoid big load to HTTP channel. For example: https://apitracker.io/a/bbc-news

Also, you can subscribe to RSS feed of BBC and check updates by it. It will be way easier to organise downloading content. For example: https://gist.github.com/mburst/5230448

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM