简体   繁体   中英

PHP - for get_headers($url, 1), are the keys for status codes *always* integers?

Looking at the PHP docs for get_headers() ...

array get_headers ( string $url [, int $format = 0 ] )

... there are two ways to run it:

#1 ( format === 0 )

$headers = get_headers($url);

// or

$headers = get_headers($url, 0);

#2 ( format !== 0 )

$headers = get_headers($url, 1);

The difference between the two being whether the arrays are numerically indexed (first case)...

(excerpt from docs )

Array
(
    [0] => HTTP/1.1 200 OK
    [1] => Date: Sat, 29 May 2004 12:28:13 GMT
    [2] => Server: Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    ... etc

... or indexed with keys (second case)...

(excerpt from docs )

Array
(
    [0] => HTTP/1.1 200 OK
    [Date] => Sat, 29 May 2004 12:28:14 GMT
    [Server] => Apache/1.3.27 (Unix)  (Red-Hat/Linux)
    [Last-Modified] => Wed, 08 Jan 2003 23:11:55 GMT
    ... etc

In the example given in the docs, the http status code belongs to a numerical index...

[0] => HTTP/1.1 200 OK

... regardless of what format is set to.

Similarly, in every valid URL that I have ever put through get_headers (ie many URLs ), the status codes have always been under numerical indexes, even when multiple status codes present...

// Output from JSON.stringify(get_headers($url, 1))

{
    "0": "HTTP/1.1 301 Moved Permanently",
    "1": "HTTP/1.1 200 OK",
    "Date": [
        "Thu, 11 Aug 2016 07:12:28 GMT",
        "Thu, 11 Aug 2016 07:12:28 GMT"
    ],
    "Content-Type": [
        "text/html; charset=iso-8859-1",
        "text/html; charset=UTF-8"
    ]
    ... etc

But, I have not (read: cannot ) test every URL on every type of server, and so cannot speak in absolutes about the status code indexes.

Is it possible that get_headers($url, 1) could return a non-numerical http status code index ? Or is it hard-coded into the function to always return the status codes under numerical indices - no matter what?


Extra reading, not necessary or essential to the question above...

For the curious, my question is mostly to do with optimization. get_headers() is already painfully slow - even when sending a HEAD request instead of GET - and only gets worse after combing through the return array with a preg_match and regex.

(The various CURL methods you'll find are even slower, I've tested them against get_headers() with very long lists of URLs, so holster that hip-shot, partner)

If I know that the status codes are always numerically indexed, then I can speed my code up a bit, by ignoring all non-integer indices, before running them through the preg_match . The difference for one URL might only be fractions of a second, but when running this function all day, every day, those little bits add up.

Additionally (Edit #1)

I'm currently only worried about the final http status code (and URL), after all redirects. I was using a method similar to this to get the final URL.

It seems that after running

$headers = array_reverse($headers);

then the final status code after the redirects will always be in $headers[0] . But, once again, this only is a sure-thing if the status codes are numerically indexed.

The PHP C source code for that function looks like this:

        if (!format) {
no_name_header:
            add_next_index_str(return_value, zend_string_copy(Z_STR_P(hdr)));
        } else {
            char c;
            char *s, *p;

            if ((p = strchr(Z_STRVAL_P(hdr), ':'))) {
                ... omitted ...
            } else {
                goto no_name_header;
            }
        }

In other words, it tests if there's a : in the header, and if so proceeds to index it by its name (omitted here). If there's no : or if you did not request to $format the result, no_name_header kicks in and it adds it to the return_value without explicit index.

So, yes, the status lines should always be numerically indexed. Unless the server puts a : into the status line, which would be unusual. Note that RFC 2616 does not explicitly prohibit the use of : in the reason phrase part of the status line:

Status-Line    = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

Reason-Phrase  = *<TEXT, excluding CR, LF>

TEXT           = <any OCTET except CTLs,
                 but including LWS>

There is no standardised reason phrase which contains a ":", but you never know, you may encounter exotic servers in the wild which defy convention here…

Since the response code is always zero indexed, you could assign it associatively and discard the original key.

$headers = get_headers($url,1);
$headers['Http-Response'] = $headers[0];
unset($headers[0]);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM