简体   繁体   中英

Writing a string compare function

I've written the following string-comparison function to evaluate whether two strings are the same or not:

bool s_compare(char* str1, char* str2)
{
    // if its the same pointer can return immediately
    // also covers the case where both are null pointers
    if (str1 == str2) 
        return true;

    // if str1 is empty and str2 is non-empty
    size_t len = strlen(str1);
    if (!len && str2[0] != 0)
        return false;

    // see if for length of str1 > 1, all letters in str2 match
    // we also go up through the nul char to make sure the size
    // of both strings is the same
    for (int i=0; i<=len; i++)
        if (str1[i] != str2[i]) 
            return false;

    return true;
}

Are there any places where the efficiency can be improved? What other considerations need to be taken into account when comparing two strings, or does this cover all cases:

  • First string is longer.
  • Second string is longer.
  • Null pointers.
  • Empty strings.
  • (Anything else?)

Your s_compare() function is having undefined behaviour because it end up dereferencing the NULL pointer if user pass a valid or empty string to str1 and NULL to str2 as arguments to s_compare() function, like this:

s_compare ("abc", NULL);
s_compare ("", NULL);

Both of these calls will result in undefined behaviour.

To cover these cases:

  • First string is longer.
  • Second string is longer.
  • Empty strings.

no need to call strlen() and get the length of string. To find out whether strings are same or not, you can use the fact that, in C, strings are actually one-dimensional array of characters terminated by a null character \0 . Just iterate over them then either their character at a specific position may be different, or if their size is different and if the long string will have initial characters same as short string till the length of short string then the position at which the short string will have null character, at that position, long string will have some other character. In both cases, the strings are not same.

Since the function s_compare() is not supposed to modify the strings passed as arguments, you should declare the both the pointer parameters const . Implementation:

bool s_compare (const char* str1, const char* str2) {

    // if its the same pointer can return immediately
    // also covers the case where both are null pointers
    if (str1 == str2) {
        return true;
    }

    // if one of them is null and other is not strings are not same
    if ((str1 == NULL) || (str2 == NULL)) {
        return false;
    }

    size_t i = 0;

    // iterater over each character of string str1
    while (str1[i]) {
        // if any of the character in str1 and str2, at position i,
        // is different that means strings are not same
        if (str1[i] != str2[i]) {
            return false;
        }
        ++i;
    }

    // we reached here that means str1 is iterated till 
    // null character and str1 ith character is null character.
    // So, if the str2 ith character is also null 
    // character than strings are same otherwise not
    return str2[i] == '\0' ? true : false;
}

Driver program:

int main (void) {
    printf ("Compare NULL and NULL : %d\n", s_compare (NULL, NULL));
    printf ("Compare \"abc\" and NULL : %d\n", s_compare ("abc", NULL));
    printf ("Compare \"\" and NULL : %d\n", s_compare ("", NULL));
    printf ("Compare NULL and \"\" : %d\n", s_compare (NULL, ""));

    char s1[10] = {0};
    char s2[10] = {0};

    printf ("Compare \"%s\" and \"%s\" : %d\n", s1, s2, s_compare (s1, s2));

    strcpy (s1, "ABC");
    strcpy (s2, "ABC");

    printf ("Compare \"%s\" and \"%s\" : %d\n", s1, s2, s_compare (s1, s2));

    strcpy (s1, "ab");
    strcpy (s2, "ayz");

    printf ("Compare \"%s\" and \"%s\" : %d\n", s1, s2, s_compare (s1, s2));

    return 0;
}

Output:

Compare NULL and NULL : 1
Compare "abc" and NULL : 0
Compare "" and NULL : 0
Compare NULL and "" : 0
Compare "" and "" : 1
Compare "ABC" and "ABC" : 1
Compare "ab" and "ayz" : 0

Are there any places where the efficiency can be improved? What other considerations need to be taken into account when comparing two strings, or does this cover all cases:

Efficient functions = leave error handling to the caller as much as possible. Well-written functions should focus on their designated purpose as much as possible. It is not the purpose of a string comparison function to sanitize pointers in the calling application, so why should it?

Ignoring the above program design aspect, it is still naive to write functions checking if pointer parameters are NULL or if they point at the same address etc. The caller might have perfect control over their data and know that it is valid - in which case your error checks only added lots of unnecessary bloat in the form of overhead code with branches.

What you should do instead is do add proper source code documentation of the function, explaining that it expects two pointers to valid data and that it does no error handling internally. Then a caller who suspect that their data isn't properly sanitized can add those NULL checks etc in the calling code. Which was the callers job all along, not yours.


Another important thing when it comes to efficiency, documentation and general good practice is to use const correctness . That is:

bool s_compare (const char* str1, const char* str2);

This tells the caller that the function won't modify the data, which is self-documenting code. It also tells the compiler that the data won't be modified, allowing various optimizations. For example it can assume that no external linkage variables will be modified by the function.

It also prevents the person writing the function from writing accidental bugs, though that argument is quite overrated in my opinion.


Furthermore, old school C comparison functions are written with the syntax

int func (const void* obj1, const void* obj2);

This form has the benefit that you can pass it to bsearch and qsort , so it might be a good idea to switch to this form instead, in case you need to search or sort the data. Here the function should return -1, 0 or 1 rather than just true/false, see documentation for bsearch etc.

Before you address efficiency or performance, you must ensure correctness. Your s_compare() function does not check if its arguments are null pointers. Whether such checks should be performed is a matter of specification: the standard strcmp() function requires valid pointers, but free() accepts null pointers and are specified as such.

From the comment, it seems the function should accept null pointers and return true if both arguments are null pointers. Yet it has undefined behavior if only one of its arguments is NULL .

There is one small issue int the posted code: variable i has type int , which is different from size_t and most likely has a smaller range. If the string length is MAX_INT or larger, the behavior is undefined. i should have type size_t .

Here are areas of improvement:

  • As the function does not modify the strings, the arguments should have type const char * to allow passing constant strings without a compiler warning.

  • the test for the empty string is redundant and saves a single comparison if str1 is empty while adding one in all other cases. It should be removed.

  • testing for identical string pointers is also redundant but does improve performance for this particular case, reducing the complexity from linear to constant time. If s_compare is called with identical pointers very often, this might be worthwhile, but in the general case, it is not necessary, except for the special case of null pointers if specified.

  • scanning the first string twice, once to compute its length and a second time to compare the characters with the second string simplifies the code but is counter productive: the first string is always read completely even when the difference would be found in the first few bytes. It seems much better to compare the contents and test for end of string progressively. Note that comparing to '\0' only needs to happen if the characters at identical.

Here is a modified version:

bool s_compare(const char *str1, const char *str2) {
    // s_compare accepts null pointers and returns true if
    // both arguments are null
    if (!str1 || !str2)
        return str1 == str2;

    // iterate for each byte until a difference is found
    // return true if the identical byte is the null terminator
    for (size_t i = 0; str1[i] == str2[i]; i++) {
        if (str1[i] == '\0') 
            return true;
    }
    return false;
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM