cURL 爬虫拒绝访问 PHP

Question

I'm creating a crawler to capture some public information.我正在创建一个爬虫来捕获一些公共信息。 However, it is returning:但是，它正在返回：

Access Denied拒绝访问
You don't have permission to access "http://www.americanas.com.br/" on this server.您无权访问此服务器上的“http://www.americanas.com.br/”。

Using Postman to test a request, cURL works perfectly.使用 Postman 测试请求，cURL 完美运行。 I even got the code generated by Postman (as shown below), but when I use it directly on my PHP server, return the error informed above.我什至得到了Postman生成的代码（如下图），但是当我直接在我的PHP服务器上使用时，返回上面提示的错误。

My cURL code:我的 cURL 代码：

$curl = curl_init();

curl_setopt_array($curl, array(
    CURLOPT_URL => "https://www.americanas.com.br/",
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_ENCODING => "",
    CURLOPT_MAXREDIRS => 10,
    CURLOPT_TIMEOUT => 30,
    CURLOPT_HTTP_VERSION => CURL_HTTP_VERSION_1_1,
    CURLOPT_CUSTOMREQUEST => "GET",
    CURLOPT_HTTPHEADER => array(
        "cache-control: no-cache",
        "postman-token: 112ebf89-1bb7-aa7a-0645-cdeabcf96488"
    ),
));

$response = curl_exec($curl);
$err = curl_error($curl);

curl_close($curl);

if($err) echo "cURL Error #:" . $err;
else echo $response;
exit();

Answer 1

I found that there are sites with more complex locks.我发现有些站点的锁更复杂。 In these cases, it is necessary to use more complete crawler solutions.在这些情况下，有必要使用更完整的爬虫解决方案。 The one I'm using and is working is Proxycawl ( https://proxycrawl.com/ ).我正在使用和工作的是 Proxycawl ( https://proxycrawl.com/ )。

Answer 2

Your postman is querying https ://www.americanas.com.br/ while from the error message we can suppose that in your crawler you are querying http ://www.americanas.com.br/您的 postman 正在查询https ://www.americanas.com.br/ 而从错误消息我们可以假设您在您的爬虫中正在查询http ://www.americanas.com.br/

cURL 爬虫拒绝访问 PHP

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-08-22 14:43:03

解决方案2
0 2021-08-02 21:39:07

cURL 爬虫拒绝访问 PHP

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-08-22 14:43:03

解决方案2 0 2021-08-02 21:39:07

解决方案1
1 已采纳 2021-08-22 14:43:03

解决方案2
0 2021-08-02 21:39:07