简体   繁体   中英

Where can I find the algorithm used to write each PHP “built-in” function?

I recently built a PHP-based application that typically requires several (>10) seconds to parse a target string (>10 seconds because there are many thousands of checks on a typically 100kB+ string). I am looking for ways to reduce the execution time.

I started to wonder how each of PHP's "built-in" functions are written. For example, if you go to the strpos() reference in the manual ( this link), there is a lot of info but not the algorithm.

Who knows, maybe I can write a function that is faster than the built-in function for my particular application? But I have no way of knowing the algorithm for eg strpos(). Does the algorithm use a method such as this one:

function strposHypothetical($haystack, $needle) {

    $haystackLength = strlen($haystack);
    $needleLength   = strlen($needle);//for this question let's assume > 0

    $pos = false;

    for($i = 0; $i < $haystackLength; $i++) {
        for($j = 0; $j < $needleLength; $j++) {
            $thisSum = $i + $j;
            if (($thisSum > $haystackLength) || ($needle[$j] !== $haystack[$thisSum])) break;          
        }
        if ($j === $needleLength) {
            $pos = $i;
            break;
        }
    }
    return $pos;
}

or would it use a much slower method, with let's say combination of substr_count() for occurrences of the needle, and if occurrences > 0, then a for loop, or some other method?

I have profiled the functions and methods in my application and made significant progress in this way. Also, note that this post doesn't really help much. Where can I find out the algorithm used for each built-in function in PHP, or is this information proprietary?

The built-in PHP functions can be found in /ext/standard/ in the PHP source code .

In the case of strpos , you can find the PHP implementation in /ext/standard/string.c . At its core, this function actually uses php_memnstr , which is actually an alias of zend_memnstr :

found = (char*)php_memnstr(ZSTR_VAL(haystack) + offset,
                           Z_STRVAL_P(needle),
                           Z_STRLEN_P(needle),
                           ZSTR_VAL(haystack) + ZSTR_LEN(haystack));

And if we read the source of zend_memnstr , we can find the algorithm itself used to implement strpos :

while (p <= end) {
    if ((p = (const char *)memchr(p, *needle, (end-p+1))) && ne == p[needle_len-1]) {
        if (!memcmp(needle, p, needle_len-1)) {
            return p;
        }
    }

    if (p == NULL) {
        return NULL;
    }
    p++;
}

ne here represents the last character of needle , and p is a pointer which is incremented to scan through the haystack .

The function memchr is a C function which should do a simple linear search through a sequence of bytes to find the first occurrence of a given byte / character in a string of bytes. memcmp is a C function which compares two byte / character ranges which can be within strings by comparing them byte-by-byte.

A pseudo-code version of this function is as follows:

while (p <= end) {
    find the next occurrence of the first character of needle;
    if (occurrence is found) {
        set `p` to point to this new location in the string;
        if ((character at `p` + `length of needle`) == last character of needle) {
            if ((next `length of needle` characters after `p`) == needle) {
                return p; // Found position `p` of needle in haystack!
            }
        }
    } else {
        return NULL; // Needle does not exist in haystack.
    }
    p++;
}

This is a fairly efficient algorithm for finding the index of a substring in a string. It is pretty much the same algorithm to your strposHypothetical , and should be just as efficient complexity-wise, unless memcpy doesn't return early as soon as it sees the strings differ by one character, and of course, being implemented in C, it will be leaner and faster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM