简体   繁体   English

PHP 或 cURL 是否以某种方式缓存其 web 响应?

[英]Does PHP or cURL cache their web responses somehow?

I have a php script making requests to some web site.我有一个 php 脚本向一些 web 站点发出请求。 I run this script from command line so no web server on my side is involved.我从命令行运行此脚本,因此不涉及我这边的 web 服务器。 Just pure PHP and a shell.只是纯粹的 PHP 和 shell。

The response is split into pages so I need to make multiple requests to gain all the data with one script run.响应分为页面,因此我需要发出多个请求以通过一个脚本运行来获取所有数据。 Obviously, the request's URL is identical except one parameter.显然,请求的 URL 除了一个参数外是相同的。 Nothing complicated:没什么复杂的:

$base_url = '...';
$pages = ...; // a number I receive elsewhere
$delay = ...; // a delay to avoid too many requests 
$p = 0;
while ($p < $pages) {
   $url = $base_url . "&some_param=$p";
   ... // Here cURL takes it's turn because of cookies
   sleep($delay);
}

The pages I get this way look all the same - like the first one that was requested.我以这种方式获得的页面看起来都一样 - 就像请求的第一个页面一样。 (So I get just a repetitive list multiplied by the number of pages.) (所以我得到的只是一个重复列表乘以页数。)

I decided that it happens because of some caching on the web server's end which persists despite of an additional random parameter I pass.我决定发生这种情况是因为 web 服务器端的一些缓存仍然存在,尽管我传递了一个额外的随机参数。 Closing and reinitializing cURL session doesn't help as well.关闭并重新初始化 cURL session 也无济于事。

I also noticed that if I quickly fix the initial $p value manually (so requests start from different page) and then launch the script again, the result changes.我还注意到,如果我手动快速修复初始$p值(因此请求从不同的页面开始)然后再次启动脚本,结果会发生变化。 I do it quicker than $delay value.我比$delay值更快。 It means that two different requests made from the same script run give same result , while two different requests made from two different script runs give different results , regardless of delay between the requests.这意味着从同一脚本运行发出的两个不同请求给出相同的结果,而从两个不同脚本运行发出的两个不同请求给出不同的结果,无论请求之间的延迟如何。 So it can't be just caching on the responded side.所以它不能只是在响应端缓存。

I tried to work that around and wrapped the actual request in a separate script which I run using exec() from the main script.我试图解决这个问题并将实际请求包装在一个单独的脚本中,我使用主脚本中的exec()运行该脚本。 So there is (should be, I consider) a separate shell instance for any single page request, and those requests should not share any kind of cache between them.因此(我认为应该)有一个单独的 shell 实例用于任何单个页面请求,并且这些请求不应在它们之间共享任何类型的缓存。 Despite of that, I keep getting the same page again.尽管如此,我仍然再次获得相同的页面。 The code looks something like that:代码看起来像这样:

$pages = ...; 
$delay = ...; 
$p = 0;
$command_stub = 'php get_single_page.php';
while ($p < $pages) {
   $command = $command_stub . " $p";
   exec($command, $response);
   // $response is the same again for different $p's
   sleep($delay);
}

If I again change the starting page manually in the script, I get a result for that page all over again.如果我再次在脚本中手动更改起始页面,我会再次获得该页面的结果。 Until I change it once more.直到我再次改变它。 And so on.等等。 Several minutes may pass between two runs of the main script, and it still yields identical result until I switch the number by hand.主脚本的两次运行之间可能会经过几分钟,在我手动切换数字之前它仍然会产生相同的结果。

I can't comprehend why this is happening.我无法理解为什么会这样。 Can somebody explain it?有人可以解释一下吗?

The short answer is no.最简洁的答案是不。 Curl certainly doesn't retain anything between executions unless configured to do so (eg: setting a cookie file). Curl 在执行之间当然不会保留任何内容,除非配置为这样做(例如:设置 cookie 文件)。

I suspect the server is expecting a session token of some sort (cookie or other HTTP header are my guess).我怀疑服务器期待某种 session 令牌(cookie 或其他 HTTP header 是我的猜测)。 Without the session token it will just ignore the request for subsequent pages.如果没有 session 令牌,它将忽略对后续页面的请求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM