简体   繁体   中英

Optimizing big array manipulation in PHP inside a function

I want to modify a big array inside a function, so I'm pretty sure I need to use references there, but I'm not sure what of these two alternatives is better (more performant, but also maybe some side effects?):

$array1 = getSomeBigArray();
$array2 = getAnotherBigArray();
$results[] = combineArrays($array1, $array2);

function combineArrays(&$array1, $array2){
   // this is not important, just example of modification
   foreach($array2 as $value){
      if($value > 0){
         $array1[] = $value;
      }
   }
   return $array1;    // will returning $array1 make a copy?
}

Option 2:

$array1 = getSomeBigArray();
$array2 = getAnotherBigArray();
combineArrays($array1, $array2);
$results[] = $array1;

function combineArrays(&$array1, $array2){

   foreach($array2 as $value){
      if($value > 0){
         $array1[] = $value;
      }
   }
   // void function
}

EDIT: I have run some tests and now I'm more confused.
This is the test:
https://ideone.com/v7sepC
From those results it seems to be faster to not use references at all! and if used is faster option1 (with return).
But in my local env using references seems to be faster (not so much).

EDIT 2:
Maybe there is a problem with ideone.com? because running the same here:
https://3v4l.org/LaffP
the result is: Opcion1 and Option2 (references) are almost equal and faster than passing by value

When you do return $array1; (in the first option) it does not copy the array, only increases the reference counter and returns the reference to the same array.

Ie returning value of the function and $array1 will be pointing to the same array in the memory. Unless you modify any of them: in that moment the data will be actually copied.

The same happens when you are assigning a value to $results[] = $array1; no data is actually copied, only a reference being put into a new element of $results .

In the end, both options have the same result: You'll have references to the same data in variable $array1 and in the last item of $results . Therefore, there is no notable performance difference in those two options.

Also, consider using native functions to perform typical actions. Eg array_merge()

code 1: 1000000 values, resources: 32

code 1: 10000000 values, resources: 67

code 2: 1000000 values, resources: 27

code 2: 2000000 values, resources: 49

I calculated the resource usage of the system by calling

getrusage

And code 2 seems to be more performant. You can use the following code to make some tests yourself:

<?php 

function getSomeBigArray() {
    $arr = [];
    for ($i=0;$i<2000000;$i++) {
        $arr[] = $i;
    }   
    return $arr;
}

function rutime($ru, $rus, $index) {
    return ($ru["ru_$index.tv_sec"]*1000 + intval($ru["ru_$index.tv_usec"]/1000))
 -  ($rus["ru_$index.tv_sec"]*1000 + intval($rus["ru_$index.tv_usec"]/1000));
}

$array1 = getSomeBigArray();
$array2 = getSomeBigArray();

$rustart = getrusage();
$results[] = combineArrays($array1, $array2);
$ru = getrusage();

echo rutime($ru, $rustart, "utime");

function combineArrays(&$array1, $array2){
    // The array combining method.
}

Note : method rutime used was copied by the right answer of the following stackoverflow post: Tracking the script execution time in PHP

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM