简体   繁体   中英

How does strchr implementation work

I tried to write my own implementation of the strchr() method.

It now looks like this:

char *mystrchr(const char *s, int c) {
    while (*s != (char) c) {
        if (!*s++) {
            return NULL;
        }
    }
    return (char *)s;
}

The last line originally was

return s;

But this didn't work because s is const. I found out that there needs to be this cast (char *), but I honestly don't know what I am doing there :( Can someone explain?

I believe this is actually a flaw in the C Standard's definition of the strchr() function. (I'll be happy to be proven wrong.) (Replying to the comments, it's arguable whether it's really a flaw; IMHO it's still poor design. It can be used safely, but it's too easy to use it unsafely.)

Here's what the C standard says:

char *strchr(const char *s, int c);

The strchr function locates the first occurrence of c (converted to a char ) in the string pointed to by s . The terminating null character is considered to be part of the string.

Which means that this program:

#include <stdio.h>
#include <string.h>

int main(void) {
    const char *s = "hello";
    char *p = strchr(s, 'l');
    *p = 'L';
    return 0;
}

even though it carefully defines the pointer to the string literal as a pointer to const char , has undefined behavior, since it modifies the string literal. gcc, at least, doesn't warn about this, and the program dies with a segmentation fault.

The problem is that strchr() takes a const char* argument, which means it promises not to modify the data that s points to -- but it returns a plain char* , which permits the caller to modify the same data.

Here's another example; it doesn't have undefined behavior, but it quietly modifies a const qualified object without any casts (which, on further thought, I believe has undefined behavior):

#include <stdio.h>
#include <string.h>

int main(void) {
    const char s[] = "hello";
    char *p = strchr(s, 'l');
    *p = 'L';
    printf("s = \"%s\"\n", s);
    return 0;
}

Which means, I think, (to answer your question) that a C implementation of strchr() has to cast its result to convert it from const char* to char* , or do something equivalent.

This is why C++, in one of the few changes it makes to the C standard library, replaces strchr() with two overloaded functions of the same name:

const char * strchr ( const char * str, int character );
      char * strchr (       char * str, int character );

Of course C can't do this.

An alternative would have been to replace strchr by two functions, one taking a const char* and returning a const char* , and another taking a char* and returning a char* . Unlike in C++, the two functions would have to have different names, perhaps strchr and strcchr .

(Historically, const was added to C after strchr() had already been defined. This was probably the only way to keep strchr() without breaking existing code.)

strchr() is not the only C standard library function that has this problem. The list of affected function (I think this list is complete but I don't guarantee it) is:

void *memchr(const void *s, int c, size_t n);
char *strchr(const char *s, int c);
char *strpbrk(const char *s1, const char *s2);
char *strrchr(const char *s, int c);
char *strstr(const char *s1, const char *s2);

(all declared in <string.h> ) and:

void *bsearch(const void *key, const void *base,
    size_t nmemb, size_t size,
    int (*compar)(const void *, const void *));

(declared in <stdlib.h> ). All these functions take a pointer to const data that points to the initial element of an array, and return a non- const pointer to an element of that array.

The practice of returning non-const pointers to const data from non-modifying functions is actually an idiom rather widely used in C language. It is not always pretty, but it is rather well established.

The reationale here is simple: strchr by itself is a non-modifying operation. Yet we need strchr functionality for both constant strings and non-constant strings, which would also propagate the constness of the input to the constness of the output. Neither C not C++ provide any elegant support for this concept, meaning that in both languages you will have to write two virtually identical functions in order to avoid taking any risks with const-correctness.

In C++ you wild be able to use function overloading by declaring two functions with the same name

const char *strchr(const char *s, int c);
char *strchr(char *s, int c);

In C you have no function overloading, so in order to fully enforce const-correctness in this case you would have to provide two functions with different names, something like

const char *strchr_c(const char *s, int c);
char *strchr(char *s, int c);

Although in some cases this might be the right thing to do, it is typically (and rightfully) considered too cumbersome and involving by C standards. You can resolve this situation in a more compact (albeit more risky) way by implementing only one function

char *strchr(const char *s, int c);

which returns non-const pointer into the input string (by using a cast at the exit, exactly as you did it). Note, that this approach does not violate any rules of the language, although it provides the caller with the means to violate them. By casting away the constness of the data this approach simply delegates the responsibility to observe const-correctness from the function itself to the caller. As long as the caller is aware of what's going on and remembers to "play nice", ie uses a const-qualified pointer to point to const data, any temporary breaches in the wall of const-correctness created by such function are repaired instantly.

I see this trick as a perfectly acceptable approach to reducing unnecessary code duplication (especially in absence of function overloading). The standard library uses it. You have no reason to avoid it either, assuming you understand what you are doing.

Now, as for your implementation of strchr , it looks weird to me from the stylistic point of view. I would use the cycle header to iterate over the full range we are operating on (the full string), and use the inner if to catch the early termination condition

for (; *s != '\0'; ++s)
  if (*s == c)
    return (char *) s;

return NULL;

But things like that are always a matter of personal preference. Someone might prefer to just

for (; *s != '\0' && *s != c; ++s)
  ;

return *s == c ? (char *) s : NULL;

Some might say that modifying function parameter ( s ) inside the function is a bad practice.

The const keyword means that the parameter cannot be modified.

You couldn't return s directly because s is declared as const char *s and the return type of the function is char * . If the compiler allowed you to do that, it would be possible to override the const restriction.

Adding a explicit cast to char* tells the compiler that you know what you're doing (though as Eric explained, it would be better if you didn't do it).

UPDATE: For the sake of context I'm quoting Eric's answer, since he seems to have deleted it:

You should not be modifying s since it is a const char *.

Instead, define a local variable that represents the result of type char * and use that in place of s in the method body.


strchr accepts a const char* and should return const char* also. You are returning a non constant which is potentially dangerous since the return value points into the input character array (the caller might be expecting the constant argument to remain constant, but it is modifiable if any part of it is returned as as a char * pointer).


Also strchr is supposed to return NULL if the sought character is not found. If it returns non-NULL when the character is not found, or s in this case, the caller (if he thinks the behavior is the same as strchr) might assume that the first character in the result actually matches (without the NULL return value there is no way to tell whether there was a match or not).

(I'm not sure if that is what you intended to do.)


I wrote and ran several tests on this function; I added a few really obvious sanity checks to avoid potential crashes:

const char *mystrchr1(const char *s, int c) {
    if (s == NULL) {
        return NULL;
    }
    if ((c > 255) || (c < 0)) {
        return NULL;
    }
    int s_len;
    int i;
    s_len = strlen(s);
    for (i = 0; i < s_len; i++) {
        if ((char) c == s[i]) {
            return (const char*) &s[i];
        }
    }
    return NULL;
}

You're no doubt seeing compiler errors anytime you write code that tries to use the char* result of mystrchr to modify the string literal being passed to mystrchr .

Modifying string literals is a security no-no, because it can lead to abnormal program termination and possibly denial-of-service attacks. Compilers may warn you when you pass a string literal to a function taking char* , but they aren't required to.

How do you use strchr correctly? Let's look at an example.

This is an example of what not to do:

#include <stdio.h>
#include <string.h>

/** Truncate a null-terminated string $str starting at the first occurence 
 *  of a character $c. Return the string after truncating it.
 */
const char* trunc(const char* str, char c){
  char* pc = strchr(str, c);
  if(pc && *pc && *(pc+1)) *(pc+1)=0;
  return str;
}

See how it modifies the string literal str via the pointer pc ? That's no bueno.

Here's the right way to do it:

#include <stdio.h>
#include <string.h>

/** Truncate a null-terminated string $str of $sz bytes starting at the first 
 *  occurrence of a character $c. Write the truncated string to the output buffer 
 *  $out.
 */
char* trunc(size_t sz, const char* str, char c, char* out){
  char* c_pos = strchr(str, c);
  if(c_pos){
    ptrdiff_t c_idx = c_pos - str;
    if((size_t)n < sz){
      memcpy(out, str, c_idx); // copy out all chars before c
      out[c_idx]=0; // terminate with null byte
    }
  }
   return 0; // strchr couldn't find c, or had serious problems
}

See how the pointer returned by strchr is used to compute the index of the matching character in the string? The index (also equal to the length up to that point, minus one) is then used to copy the desired part of the string to the output buffer.

You might think "Aw, that's dumb! I don't want to use strchr if it's just going to make me memcpy." If that's how you feel, I've never run into a use case of strchr , strrchr , etc. that I couldn't get away with using a while loop and isspace , isalnum , etc. Sometimes it's actually cleaner than using strchr correctly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM