[英]Pulling data from API, memory growth
I'm working on a project where I pull data (JSON) from an API. 我正在开发一个项目,我从API中提取数据(JSON)。 The problem I'm having is that the memory is slowly growing until I get the dreaded fatal error: 我遇到的问题是内存正在慢慢增长,直到我遇到可怕的致命错误:
Fatal error: Allowed memory size of * bytes exhausted (tried to allocate * bytes) in C:... on line * 致命错误:允许的内存大小*字节耗尽(试图分配*字节)在C:...在线*
I don't think there should be any memory growth. 我不认为应该有任何记忆增长。 I tried unsetting everything at the end of the loop but no difference. 我尝试在循环结束时取消所有内容,但没有区别。 So my question is: am I doing something wrong? 所以我的问题是:我做错了吗? Is it normal? 这是正常的吗? What can I do to fix this problem? 我该怎么做才能解决这个问题?
<?php
$start = microtime(true);
$time = microtime(true) - $start;
echo "Start: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
include ('start.php');
include ('connect.php');
set_time_limit(0);
$api_key = 'API-KEY';
$tier = 'Platinum';
$threads = 10; //number of urls called simultaneously
function multiRequest($urls, $start) {
$time = microtime(true) - $start;
echo " start function: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
$nbrURLS = count($urls); // number of urls in array $urls
$ch = array(); // array of curl handles
$result = array(); // data to be returned
$mh = curl_multi_init(); // create a multi handle
$time = microtime(true) - $start;
echo " Creation multi handle: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
// set URL and other appropriate options
for($i = 0; $i < $nbrURLS; $i++) {
$ch[$i]=curl_init();
curl_setopt($ch[$i], CURLOPT_URL, $urls[$i]);
curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, 1); // return data as string
curl_setopt($ch[$i], CURLOPT_SSL_VERIFYPEER, 0); // Doesn't verifies certificate
curl_multi_add_handle ($mh, $ch[$i]); // Add a normal cURL handle to a cURL multi handle
}
$time = microtime(true) - $start;
echo " For loop options: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
// execute the handles
do {
$mrc = curl_multi_exec($mh, $active);
curl_multi_select($mh, 0.1); // without this, we will busy-loop here and use 100% CPU
} while ($active);
$time = microtime(true) - $start;
echo " Execution: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
echo ' For loop2<br>';
// get content and remove handles
for($i = 0; $i < $nbrURLS; $i++) {
$error = curl_getinfo($ch[$i], CURLINFO_HTTP_CODE); // Last received HTTP code
echo " error: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
//error handling if not 200 ok code
if($error != 200){
if($error == 429 || $error == 500 || $error == 503 || $error == 504){
echo "Again error: $error<br>";
$result['again'][] = $urls[$i];
} else {
echo "Error error: $error<br>";
$result['errors'][] = array("Url" => $urls[$i], "errornbr" => $error);
}
} else {
$result['json'][] = curl_multi_getcontent($ch[$i]);
echo " Content: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
}
curl_multi_remove_handle($mh, $ch[$i]);
curl_close($ch[$i]);
}
$time = microtime(true) - $start;
echo " after loop2: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br>";
curl_multi_close($mh);
return $result;
}
$gamesId = mysqli_query($connect, "SELECT gameId FROM `games` WHERE `region` = 'EUW1' AND `tier` = '$tier ' LIMIT 20 ");
$urls = array();
while($result = mysqli_fetch_array($gamesId))
{
$urls[] = 'https://euw.api.pvp.net/api/lol/euw/v2.2/match/' . $result['gameId'] . '?includeTimeline=true&api_key=' . $api_key;
}
$time = microtime(true) - $start;
echo "After URL array: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
$x = 1; //number of loops
while($urls){
$chunk = array_splice($urls, 0, $threads); // take the first chunk ($threads) of all urls
$time = microtime(true) - $start;
echo "<br>After chunk: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
$result = multiRequest($chunk, $start); // Get json
unset($chunk);
$nbrComplete = count($result['json']); //number of retruned json strings
echo 'For loop: <br/>';
for($y = 0; $y < $nbrComplete; $y++){
// parse the json
$decoded = json_decode($result['json'][$y], true);
$time = microtime(true) - $start;
echo " Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "<br/>";
}
unset($nbrComplete);
unset($decoded);
$time = microtime(true) - $start;
echo $x . ": ". memory_get_peak_usage(true) . " | " . $time . "<br>";
// reuse urls
if(isset($result['again'])){
$urls = array_merge($urls, $result['again']);
unset($result['again']);
}
unset($result);
unset($time);
sleep(15); // limit the request rate
$x++;
}
include ('end.php');
?>
PHP Version 5.3.9 - 100 loops: PHP版本5.3.9 - 100循环:
loop: memory | time (sec)
1: 5505024 | 0.98330211639404
3: 6291456 | 33.190237045288
65: 6553600 | 1032.1401019096
73: 6815744 | 1160.4345710278
75: 7077888 | 1192.6274609566
100: 7077888 | 1595.2397520542
EDIT: 编辑:
After trying it with PHP 5.6.14 xampp on windows: 在Windows上使用PHP 5.6.14 xampp尝试之后:
loop: memory | time (sec)
1: 5505024 | 1.0365679264069
3: 6291456 | 33.604479074478
60: 6553600 | 945.90159296989
62: 6815744 | 977.82566595078
93: 7077888 | 1474.5941500664
94: 7340032 | 1490.6698410511
100: 7340032 | 1587.2434458733
EDIT2: I only see the memory increase after json_decode
EDIT2:我只看到json_decode
之后的内存增加
Start: 262144 | 135448
After URL array: 262144 | 151984
After chunk: 262144 | 152272
start function: 262144 | 152464
Creation multi handle: 262144 | 152816
For loop options: 262144 | 161424
Execution: 3145728 | 1943472
For loop2
error: 3145728 | 1943520
Content: 3145728 | 2095056
error: 3145728 | 1938952
Content: 3145728 | 2131992
error: 3145728 | 1938072
Content: 3145728 | 2135424
error: 3145728 | 1933288
Content: 3145728 | 2062312
error: 3145728 | 1928504
Content: 3145728 | 2124360
error: 3145728 | 1923720
Content: 3145728 | 2089768
error: 3145728 | 1918936
Content: 3145728 | 2100768
error: 3145728 | 1914152
Content: 3145728 | 2089272
error: 3145728 | 1909368
Content: 3145728 | 2067184
error: 3145728 | 1904616
Content: 3145728 | 2102976
after loop2: 3145728 | 1899824
For loop:
Decode: 3670016 | 2962208
Decode: 4980736 | 3241232
Decode: 5242880 | 3273808
Decode: 5242880 | 2802024
Decode: 5242880 | 3258152
Decode: 5242880 | 3057816
Decode: 5242880 | 3169160
Decode: 5242880 | 3122360
Decode: 5242880 | 3004216
Decode: 5242880 | 3277304
Your method is quite long, so I don't believe that garbage collection wont get fired until the very end of a function, which means your unused variables can build up. 你的方法很长,所以我不相信垃圾收集在函数结束之前不会被触发,这意味着你未使用的变量可能会积累。 If they aren't going to be used anymore, then garbage collection would take care of this for you. 如果它们不再被使用,那么垃圾收集将为您解决这个问题。
You might think about refactoring this code into smaller methods to take advantage of this, and with all the other good stuff that comes with having smaller methods, however in the meantime you could try putting gc_collect_cycles();
您可能会考虑将此代码重构为较小的方法以利用此方法,并考虑使用较小的方法所带来的所有其他好东西,但同时您可以尝试使用gc_collect_cycles();
at the very end of your loop to see if you can free some memory: 在循环的最后,看看你是否可以释放一些内存:
if(isset($result['again'])){
$urls = array_merge($urls, $result['again']);
unset($result['again']);
}
unset($result);
unset($time);
gc_collect_cycles();//add this line here
sleep(15); // limit the request rate
Edit : the segment I have updated actually doesn't belong to the big function, however I suspect maybe the size of $result
may bowl things over, and it wont get cleaned until the loop terminates, possibly. 编辑:我实际更新的段实际上不属于大函数,但是我怀疑$result
的大小可能会使事情结束,并且它可能会在循环终止之前得到清理。 This is a worth a try however. 然而,这是值得一试的。
So my question is: am I doing something wrong? 所以我的问题是:我做错了吗? Is it normal? 这是正常的吗? What can I do to fix this problem? 我该怎么做才能解决这个问题?
Yes, running out of memory is normal when you use all of it. 是的,当你全部使用它时,内存不足是正常的。 You are requesting 10 simultaneous HTTP requests and unserializing the JSON responses in to PHP memory. 您正在请求10个同时发出的HTTP请求,并将JSON响应反序列化到PHP内存中。 Without limiting the size of the responses you will always be in danger of running out of memory. 在不限制响应大小的情况下,您将始终处于内存不足的危险之中。
What else can you do? 你还能做什么?
$threads
down to 1 to test this. 将$threads
减少到1来测试它。 If there is a memory leak in a C extension calling gc_collect_cycles()
will not free any memory, this only affects memory allocated in the Zend Engine which is no longer reachable. 如果C扩展中存在内存泄漏,则调用gc_collect_cycles()
将不释放任何内存,这只会影响Zend Engine中分配的内存,该内存不再可访问。 So my question is: am I doing something wrong? 所以我的问题是:我做错了吗? Is it normal? 这是正常的吗? What can I do to fix this problem? 我该怎么做才能解决这个问题?
There is nothing wrong with your code because this is the normal behaviour, you are requesting data from an external source, which in turn is loaded into memory. 您的代码没有任何问题,因为这是正常行为,您从外部源请求数据,而外部源又被加载到内存中。
Of course a solution to your problem could be as simple as: 当然,解决问题的方法可以简单到:
ini_set('memory_limit', -1);
Which allows for all the memory needed to be used. 这允许使用所有需要的内存。
When I'm using dummy content the memory usage stays the same between requests. 当我使用虚拟内容时,请求之间的内存使用量保持不变。
This is using PHP 5.5.19 in XAMPP on Windows. 这是在Windows上的XAMPP中使用PHP 5.5.19。
There has been a cURL memory leak related bug which was fixed in Version 5.5.4 在版本5.5.4中修复了与cURL内存泄漏相关的错误
I tested your script on 10 URLS. 我在10个URL上测试了你的脚本。 I removed all your comments except one comment at the end of the script and one in problem loop when used json_decode. 我删除了所有注释,除了脚本末尾的一个注释和使用json_decode时问题循环中的一个注释。 Also I opened one page which you encode from API and looked very big array and I think you're right, you have an issue in json_decode. 我也打开了一个你用API编码的页面,看起来非常大的数组,我认为你是对的,你在json_decode中有问题。
Results and fixes. 结果和修复。
Result without changes: 结果没有变化:
Code: 码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
Result: 结果:
Decode: 3407872 | 2947584
Decode: 3932160 | 2183872
Decode: 3932160 | 2491440
Decode: 4980736 | 3291288
Decode: 6291456 | 3835848
Decode: 6291456 | 2676760
Decode: 6291456 | 4249376
Decode: 6291456 | 2832080
Decode: 6291456 | 4081888
Decode: 6291456 | 3214112
Decode: 6291456 | 244400
Result with unset($decode)
: 未unset($decode)
结果unset($decode)
:
Code: 码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
unset($decoded);
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
Result: 结果:
Decode: 3407872 | 1573296
Decode: 3407872 | 1573296
Decode: 3407872 | 1573296
Decode: 3932160 | 1573296
Decode: 4456448 | 1573296
Decode: 4456448 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 1573296
Decode: 4980736 | 244448
Also you can add gc_collect_cycles: 您还可以添加gc_collect_cycles:
Code: 码:
for($y = 0; $y < $nbrComplete; $y++){
$decoded = json_decode($result['json'][$y], true);
unset($decoded);
gc_collect_cycles();
$time = microtime(true) - $start;
echo "Decode: ". memory_get_peak_usage(true) . " | " . memory_get_usage() . "\n";
}
It's can help for you in some cases, but in results it can comes to performance degradation. 在某些情况下,它可以为您提供帮助,但结果可能会导致性能下降。
You can try restart script with unset
, and unset+gc
and write ago if you will have the same issue after changes. 您可以尝试使用unset
重新启动脚本,如果更改后遇到相同的问题,请unset+gc
并写入。
Also I don't see where you use $decoded
variable, if it's error in code, you can remove json_decode :) 另外我没看到你在哪里使用$decoded
decoding变量,如果在代码中出错,你可以删除json_decode :)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.