简体   繁体   中英

Why aren't string literals passed as references to arrays instead of opaque pointers?

In C++, the type of string literals is const char [N] , where N , as std::size_t , is the number of characters plus one (the zero-byte terminator). They reside in static storage and are available from program initialization to termination.

Often, functions taking a constant string doesn't need the interface of std::basic_string or would prefer to avoid dynamic allocation; they may just need, for instance, the string itself and its length. std::basic_string , particularly, has to offer a way to be constructed from the language's native string literals. Such functions offer a variant that takes a C-style string:

void function_that_takes_a_constant_string ( const char * /*const*/ s );

// Array-to-pointer decay happens, and takes away the string's length
function_that_takes_a_constant_string( "Hello, World!" );

As explained in this answer , arrays decay to pointers, but their dimensions are taken away. In the case of string literals, this means that their length, which was known at compile-time, is lost and must be recalculated at runtime by iterating through the pointed memory until a zero-byte is found. This is not optimal.

However, string literals, and, in general, arrays, may be passed as references using template parameter deduction to keep their size:

template<std::size_t N>
void function_that_takes_a_constant_string ( const char (& s)[N] );

// Transparent, and the string's length is kept
function_that_takes_a_constant_string( "Hello, World!" );

The template function could serve as a proxy to another function, the real one, which would take a pointer to the string and its length, so that code exposure was avoided and the length was kept.

// Calling the wrapped function directly would be cumbersome.
// This wrapper is transparent and preserves the string's length.
template<std::size_t N> inline auto
function_that_takes_a_constant_string
( const char (& s)[N] )
{
    // `s` decays to a pointer
    // `N-1` is the length of the string
    return function_that_takes_a_constant_string_private_impl( s , N-1 );
}

// Isn't everyone happy now?
function_that_takes_a_constant_string( "Hello, World!" );

Why isn't this used more broadly? In particular, why doesn't std::basic_string have a constructor with the proposed signature?


Note: I don't know how the proposed parameter is named; if you know how, please, suggest an edition to the question's title.

The trouble with adding such a templated overload is simple:

It would be used whenever the function is called with a static buffer of char -type, even if the buffer is not as a whole a string, and you really wanted to pass only the initial string ( embedded zeroes are far less common than terminating zeroes , and using part of a buffer is very common ): Current code rarely contains explicit decay from array to pointer to first element, using a cast or function-call.

Demo-code (On coliru) :

#include <stdio.h>
#include <string.h>

auto f(const char* s, size_t n) {
    printf("char* size_t %u\n", (unsigned)n);
    (void)s;
}
auto f(const char* s) {
    printf("char*\n");
    return f(s, strlen(s));
}
template<size_t N> inline auto
f( const char (& s)[N] ) {
    printf("char[&u]\n");
    return f(s, N-1);
}

int main() {
    char buffer[] = "Hello World";
    f(buffer);
    f(+buffer);
    buffer[5] = 0;
    f(buffer);
    f(+buffer);
}

Keep in mind: If you talk about a string in C, it always denotes a 0-terminated string, while in C++ it can also denote a std::string , which is counted.

It's largely historical, in a sense. While you're correct that there's no real reason this can't be done (if you don't want to use your whole buffer, pass a length argument, right?) it's still true that if you have a character array it's usually a buffer not all of which you're using at any one time:

char buf[MAX_LEN];

Since this is usually how they're used, it seems needless or even risky to go to the trouble of adding a new basic_string constructor template for const CharT (&)[N] .

The whole thing is pretty borderline though.

I believe this is being addressed in C++14 building on user defined string literals

http://en.cppreference.com/w/cpp/string/basic_string/operator%22%22s

#include <string>

int main()
{
    //no need to write 'using namespace std::literals::string_literals'
    using namespace std::string_literals;

    std::string s2 = "abc\0\0def"; // forms the string "abc"
    std::string s1 = "abc\0\0def"s; // form the string "abc\0\0def"
}

You can create helper class that will fix that without using overload for every function

struct string_view
{
    const char* ptr;
    size_t size;
    template<size_t N>
    string_view(const char (&s)[N])
    {
        ptr = s;
        size = N;
    }
    string_view(const std::string& s)
    {
        ptr = s.data();
        size = s.size() + 1; // for '\0' at end
    }
};
void f(string_view);
main()
{
    string_view s { "Hello world!" };
    f("test");
}

You should expand this class for helper function (like begine and end ) to simplify usage in your program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM