简体   繁体   中英

PHP performance: if vs. assignment

I found a similar question here: Performance: condition testing vs assignment

This question is not about optimization. It's about coding preferences.

Here is an example:

I have data that I have no control over. It's from a 3rd party in the form of rows from a db table. It's the result of a MSSQL SP. Being bloated, I'd like to reduce it's size before transmitting the data over the wire as JSON. I can make it about 80% smaller as most of the data is repetitive.

So I do something like so:

    $processed = array();
    foreach ($result as $row)
    {
        $id = $row['id'];
        $processed[$id]['title'] = $row['title'];
        $processed[$id]['data'] = $row['data'];
        $processed[$id]['stuff'] = $row['stuff'];
        /* many more assignments with different keys */

        $unique = array();
        $unique['cost'] = $row['cost'];
        /* a few more assignments with different keys */

        $processed[$id]['prices'][$row['date']] = $unique;
    }

I thought this might be quicker, but it looks slower (I timed it):

    $processed = array();
    $id = null;
    foreach ($result as $row)
    {
        if ($id != $row['id'])
        {
             $id = $row['id'];
             $processed[$id]['title'] = $row['title'];
             $processed[$id]['data'] = $row['data'];
             $processed[$id]['stuff'] = $row['stuff'];
             /* many more similar lines */
        }

        $unique = array();
        $unique['cost'] = $row['cost'];
        /* a few more similar lines */

        $processed[$id]['prices'][$row['date']] = $unique;
    }

Can anyone confirm that with PHP "if"s or conditionals are indeed more compute intensive that assignments? Thanks.

[My answer as an edit]

I did some stand alone tests (without and real data or other code overhead) on FastCGI PHP running with IIS:

function testif()
{
    $i = 0;
    while ($i < 100000000)
    {
        if (1 != 0)  /* do nothing */;
        $i++;
    }

    return "done";
}

1st run: 20.7496500015256748 sec.

2nd run: 20.8813898563381191 sec.

function testassign()
{
    $i = 0;
    while ($i < 100000000)
    {
        $x = "a 26 character long string";
        $i++;
    }

    return "done";
}

1st run: 21.0238358974455215 sec.

2nd run: 20.7978239059451699 sec.

Well, being compared to time, required to transfer this JSON data to the client such a difference would be indeed a drop in the ocean.
Heck, even JSON encoding alone will do such if-s and assignments in thousands while encoding your data! Doing tests to compare these matters IS what you are doing wrong.

This is extraordinary limited point of view that leads to such questions.
While there are zillions other "CPU cycles" involved, a difference in thousand will make no difference.

  • there are a web-server that handles your request
  • there are a php interpreter (which, by default, have to parse whole your code picking it character after character)
  • there is a database lookup, which have to handle gigabytes of data
  • there is a network latency.

So, to make an adequate comparison, one have to involve all these matter into their tests. And start worrying only if a real life test will show any difference. Otherwise it will be complete and entire waste of time.

Such kind of questions is one of evilest things in our poor PHP community.
There is nothing bad in concerning in performance. But there is nothing worse than such "what is faster" question just off one's head.

And "it's just a theoretical question!" is not an excuse. these questions never being theoretical, be honest to yourself. One, who REALLY interested in all nitty-gritties, going another way - dealing with sources, debuggers and profilers, not running silly "zillion iterations of nothing" tests.

One who really concerned in speed, does measurements first. Such a measurement is called "profiling" , and have a goal in finding a bottleneck - a matter, that REALLY makes your application slower.

However, sometimes no sophisticated measurements required but just little thinking.
for example, if you have too much repetitive data - shy not to ask your database to return a smaller dataset first?

As I already wrote as comment to the first post:

Not related to "performance: if vs assignment", but one way to make textual data much smaller is compressing it (gzip/deflate). You say that most data is repetitive - that means that it would have great compression ratio. Compressing can be enabled globally in server configuration, ie, you don't have to change your script for that.

Compressed "processed data" probably would be somewhat smaller than "full data", though I doubt it could be 80% smaller.


Now about the performance.

Code:

$time = microtime(true);
$data = array();
for ( $n = 0; $n < 25000; ++$n ) {
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
    $data[] = array('id' => $n, 'text' => 'foo bar', 'key1' => 'value1', 'key2' => 'value2', 'key3' => 'value3');
}
printf("%.05f\n\n", microtime(true) - $time);

for ( $n = 0; $n < 10; ++$n ) {
    $time = microtime(true);
    $tmp = array();
    foreach ( $data as $row ) {
        $id = $row['id'];
        $tmp[$id]['text'] = $row['text'];
        $tmp[$id]['key1'] = $row['key1'];
        $tmp[$id]['key2'] = $row['key2'];
        $tmp[$id]['key3'] = $row['key3'];
    }
    printf("%.05f\n", microtime(true) - $time);
}
echo "\n";

for ( $n = 0; $n < 10; ++$n ) {
    $time = microtime(true);
    $tmp = array();
    $id = null;
    foreach ( $data as $row ) {
        if ( $row['id'] !== $id ) {
            $id = $row['id'];
            $tmp[$id]['text'] = $row['text'];
            $tmp[$id]['key1'] = $row['key1'];
            $tmp[$id]['key2'] = $row['key2'];
            $tmp[$id]['key3'] = $row['key3'];
        }
    }
    printf("%.05f\n", microtime(true) - $time);
}
echo "\n";

for ( $n = 0; $n < 10; ++$n ) {
    $time = microtime(true);
    $tmp = array();
    foreach ( $data as $row ) {
        if ( !isset($tmp[$row['id']]) ) {
            $id = $row['id'];
            $tmp[$id]['text'] = $row['text'];
            $tmp[$id]['key1'] = $row['key1'];
            $tmp[$id]['key2'] = $row['key2'];
            $tmp[$id]['key3'] = $row['key3'];
        }
    }
    printf("%.05f\n", microtime(true) - $time);
}
echo "\n";

Results:

0.26685; 0.32710; 0.30996; 0.31132; 0.31148; 0.31072; 0.31036; 0.31082; 0.30957; 0.30952; 
0.21155; 0.21114; 0.21132; 0.21119; 0.21042; 0.21128; 0.21176; 0.21075; 0.21139; 0.21703; 
0.21596; 0.21576; 0.21728; 0.21720; 0.21610; 0.21586; 0.21635; 0.22057; 0.21635; 0.21888; 

I'm not sure why, but first timing of first test is constantly smaller than other timings for the same test (0.26-0.27 vs 0.31-0.32). Other than that, it seems to me that it is worth checking if row already exists.

I believe that conditionals are slower in any language. This is related to how the compiler and CPU interact with the code. The CPU looks at the opcode generated by the compiler and tries to pre-fetch future instructions into cache. If you are branching, then it might not be able to cache the next instruction. I think there's some rule that the code block that is most likely should be in the if part, and the case that comes up less often in the "else" block.

I did a quick google search and there was another related question/answer on StackOverflow awhile back: Effects of branch prediction on performance?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM