简体   繁体   中英

Confused with char **

If I write

//case 1
char *animals[2] = {"cat", "dog"};
char **animal_ptr = animals; 
printf("%s\n", *(animal_ptr + 1)); //fine

And, in a different way :

//case 2
char *animal = "cat";
char **animal_ptr = &animal;
 *(animal_ptr + 1) = "dog";
printf("%s\n", *(animal_ptr + 1)); //fine

So, I got confused by the two examples above.

  1. In case 1 I understand that animal_ptr is a pointer to a collection of pointers, and as pointers hold addresses I don't need to add an & . But in case 2, I had to add an & for it to work, even though animal is already a pointer. Why ?

  2. In both cases, why is modifying the string literals through another pointer acceptable ? As I know, when you declare a string such as char *x = "..etc"; , it's placed in a section in memory which cannot be modified. So why is it in case 1 that both animal and animal_ptr can modify the string ?

  3. Why does strcpy(*(animal_ptr + 1), "bird") fail, and the program stop, even though assignment worked in 2. ?

  4. in case 2 :

    • When i do printf("%s", *animal_ptr) , it works fine and make sense for me.
    • When i do printf("%s", *animal) , it stops. Why ?

Thanks, and sorry for many questions.

The thing with pointers and strings is that you have to keep track of where the space for them has been allocated, and whether it's writable. You also have to understand the difference between copying strings using strcpy , versus rearranging pointers.

When you use string constants like "cat" and "dog" the compiler allocates space for the strings automatically, but the strings are not modifiable. (That is, you cannot copy new strings on top of them using strcpy .)

In your case 1, you have an "array" of two strings. There are several ways of thinking about this. Since a string is an array of characters, and you have two strings, you can think of this as a "two-dimensional array". That's why animals has one * and one pair of brackets [] , and why animal_ptr has two * 's. Or, since a char * is how we usually refer to strings in C, you can see that animals is an array of two strings.

It's also important to check the allocation of everything. The compiler took care of "cat" and "dog" . You allocated animals as an array of size 2, so that's taken care of. Finally, animal_ptr is set to point to wherever animals is, so it's got proper allocation, too. (But note that animals and animal_ptr refer to the same storage.)

The situation is different in your case 2, however. You start out with just one string, "cat" , pointed to by one pointer, animal . You again point at that one string with a pointer-to-pointer, animal_ptr . Everything's okay so far, but what we have is the equivalent of an array of one string. It's equivalent to

char *animals[1]={"cat"};
char **animal_ptr=animals;

So when you later said *(animal_ptr+1)="dog" , you were writing to a cell of the "array" that did not exist. You ended up overwriting some other part of memory. Sometimes you can get away with that, sometimes it makes some other part of your program behave wrongly, sometimes it makes your program crash.

It might be easier to see this if instead of *(animal_ptr+1)="dog" we were to write the equivalent

animal_ptr[1] = "dog";

Since animal_ptr was the equivalent of a 1-element array, the only legal subscript is [0] , not [1] .

Now to answer your specific questions:

Q1. In both examples, animal_ptr is a pointer to a pointer. In case 1, animal is an array, and in C, you automatically get a pointer to an array's first element without using an explicit & . In case 2, animal is a simple pointer, so you need an explicit & , to take its address, to generate the pointer-to-pointer that animal_ptr requires.

Q2. In neither case are you modifying any strings. You're right that the strings themselves are not writable, but the array animals in case 1 is writable, so it's fine to plug in new pointers. You can say thing like animals[0] = "chicken" and animals[1] = "pig" until the cows come home. In case 2, you can say animal = "chicken" or animal_ptr[0] = "pig" , because the first pointer ( animal ) is writable, but you can't modify animal_ptr[1] , not because it's not writable, but because it doesn't exist.

Q3. You can't use strcpy with either of these examples, because that's trying to copy new characters into one of your existing strings, and that fails because all of your existing strings are compiler-allocated and nonwritable.

If you want to see how strcpy works, you have to allocate some writable storage for strcpy to copy to. You could do something like

char newarray[10];
animal_ptr[0] = newarray;
strcpy(animal_ptr[0], "bird");

or

animal_ptr[0] = malloc(10);
strcpy(animal_ptr[0], "bird");

Q4. When you do printf("%s",*animal_ptr) that's equivalent to

printf("%s", animal_ptr[0]);

printf %s wants a pointer, and you're giving it one. However, when you wrote printf("%s", *animal) , the expression *animal is fetching the first character pointed to by animal , probably the letter 'c' in "cat" . You're then taking that character and asking printf to print it as a string, with %s . But as we just saw, %s wants a pointer. So it tried to print the string at address 99 in memory (because the ASCII value of 'c' is 99), and crashed.

More about pointer assignment versus strcpy

The other thing about strings and pointers in C is that it's pretty confusing at first what it means to assign them. C doesn't have a true, "first-class" built-in string type, and the way strings are represented as arrays of / pointers to characters causes us to always have to keep in mind the distinction between the pointer and what the pointer points to .

To make this very clear, let's switch gears for a moment and think about pointers to integers. Suppose we have

int four = 4;
int five = 5;
int *intptr = &four;

So we have a pointer intptr , and what it points to is the value 4 (which just happens to also be the value of the variable four ). If we then say

intptr = &five;

we are changing the value of the pointer . It used to point to four / 4 , and now it points to five / 5 . Now suppose that we say

*intptr = 6;

In this case, we have changed the pointed-to value . intptr points to the same location it did before ( &five ), but we've changed the value at that location from 5 to 6 . (And of course, now if we said printf("%d", five) we'd somewhat oddly get 6 .)

Now, back to strings. If I have

char *animal = "cat";

and then later I say

animal = "dog";

I have changed where the pointer points. It used to point to a compiler-allocated, read-only piece of memory containing the string "cat" , and now it points to a different, compiler-allocated, read-only piece of memory containing the string "dog" .

Now suppose I say

strcpy(animal, "elephant");

In this case, I am not changing the pointer animal , I am asking strcpy to write new characters to the location pointed to by animal . But, remember, animal currently points to that compiler-allocated, read-only piece of memory containing the string "dog" . And since it's read-only this attempt to write new characters there fails. And even if it didn't fail for that reason, we would have a different problem, because of course "elephant" is bigger than "dog" .

The bottom line is that there are two totally different ways to "assign" strings in C: assigning pointers and calling strcpy . But they really are totally different.

You may have heard that "you can't compare strings using == , < , or > ; you have to call strcmp ". The comparison operators compare the pointers (which is usually not what you want), while strcmp compares the pointed-to characters. But when it comes to assignment, you can do it either way: by reassigning pointers, or by copying characters with strcpy .

But if you're calling strcpy , you always have to make sure that the destination pointer (a) does point somewhere, and that the memory region it points to is (b) big enough and (c) writable. The destination pointer will usually point to a character array you have allocated, or a dynamic memory region obtained with malloc . (That's what I was demonstrating to you in my answer to your Q3.) But the destination can not be a pointer to a compiler-allocated string. That's very easy to do my mistake, but it doesn't work, as you've seen.

Just because you can treat an array as a pointer doesn't mean that a pointer is pointing to an array. Being wrong about what is in memory at the location a pointer is pointing to is one of the primary sources of bugs in the real world. That's why it's very important that you understand what the compiler is allocating for you (globally or on the stack) and what you need to allocate for yourself (via dynamic allocation, AKA malloc).

So, your case 1 code:

/* case 1 line 1 */ char *animals[2] = { "cat","dog" }

The compiler has preallocated two four byte blocks of memory and stored the consecutive characters 'c', 'a', 't', and 0 in the first and 'd', 'o', 'g', and zero in the second. Because these strings were compiler generated from string literals they are (theoretically at least) read only. Trying to change the 'c' to a 'b' in your code invokes the dreaded "undefined behavior". The initialization you are performing here would emit a complaint from the compiler if you asked it to report suspicious code (usually called warnings).

The compiler also preallocates an array of two char * pointers in which it stores the address of the preallocated "cat" string, and the address of the preallocated "dog" string. This array is called animals. The compiler knows that it allocated space for the array and the compiler will release that space once the array goes "out of scope". The animals array has an address which can be stored in other pointer variables.

/* case 1 line 2 */ char **animal_ptr=animals;

Here the compiler preallocates storage for the char ** variable animal_ptr then initializes it with the address of the animals array.

Here's were we start seeing the difference between a variable that is an array, and a variable that is a pointer. It is perfectly legal to set animal_ptr = animals, but it would never be legal to set animals = animal_ptr. The take-away here is that an array can be used as a pointer of it's equivalent type, but it is not a pointer.

/* case 1 line 3 */ printf("%s\n",*(animal_ptr+1));

Pointer addition is defined as incrementing the contents of the pointer (in this case the address of the animals array) by the size of whatever it is pointing to. In this case animal_ptr is pointing to a char *. animal_ptr+1 would then be a pointer to the address of the second element in the animals array (aka &animals[1]). Dereferencing *(animal_ptr+1) yeilds a char * pointer whose value is the address of the "dog" string. Printf's %s string parser then uses that address to print the string dog .

/* case 2 line 1 */ char *animal="cat";

Compiler preallocates storage for string ('c', 'a', 't', 0). Compiler preallocates space (sizeof char *) for pointer animal and initializes it with address of "cat" string (same warning as before would happen here).

/* case 2 line 2 */ char **animal_ptr=&animal;

Compiler preallocates storage space (sizeof char **) for pointer animal_ptr and initializes it with the address of the animal pointer;

/* case 2 line 3 */  *(animal_ptr+1)="dog";

First the compiler preallocates space for the "dog" string, then it tries to store the address of that allocated block...somewhere.

This is the main error. animal_ptr is holding the address of animal, but the storage allocated at that address is only wide enough for one pointer (which the compiler preallocated in line 1). When you do the animal_ptr+1 here you've moved the pointer beyond the space allocated by the compiler for animal. Thus you (probably) have a valid memory address, but it doesn't point into a location in memory that is known to be allocated. This is undefined behavior, and the results of dereferencing this memory (to either write to it here or read from it on the next line) can't be predicted. For sure you've just stored the address of the "dog" string on top of whatever happened to be in memory just past the animal pointer.

/* case 2 line 4 */ printf("%s\n",*(animal_ptr+1)); //fine

Well, you can say it's fine, but it really isn't. You are again dereferencing a pointer stored in unallocated memory. If you are lucky this works. If you are unlucky you just corrupted your stack and when you carry onward something totally unexpected will happen. This kind of thing is exactly the sort of error that privilege exploits are based on.

To answer your specific questions:

  1. Pointers are just memory addresses, but the compiler cares what it thinks they are pointing to. In case 2 animal has type 'char *' and animal_ptr has type 'char **'. &animal has type 'char **' so the compiler accepted that.

  2. Nothing in your code is attempting to modify the contents of the "dog" and "cat" strings. Even if they had, it might still work, since modifying a const variable through a coerced non-const pointer is undefined behavior. On systems where the loader placed the strings in writable memory it would probably work. On systems which placed the strings in read only memory you would get a memory error (segmentation fault for instance).

  3. Well, it depends on where the strcpy was placed. I'm guessing you mean in place of case 2 line 3? In that case *(animal_ptr + 1) is an uninitialized pointer, so when strcpy attempts to do the copy who knows where it's trying to write the string.

  4. *animal is of type char. %s is expecting something of type char *. When printf attempts to dereference the value that was passed in, it's not a valid pointer so it walks off into who knows where trying to read the string.

in both cases animal_ptr is a double pointer: a pointer to another pointer. In case 1, it is specifically pointing to the first element in your array, and in case 2, to the only one.

You could think about pointer variables as having "levels" with a lv2 pointer being a pointer to another pointer, and a lv3 pointing to a lv2, and so on. When declaring variables, each * and [] "increase" your level by 1. When assinging values, * accesses the info inside the pointer and therefore "lowers" a level, and & asks for that pointer's location in memory, therefore "increasing" it.

PS: In case 2, you're actually messing up badly! You're writing over the location pointed by animal_ptr+1 which could be used by either another variable (messing your code big time) in that function's stack, or not being part of the stack at all, and therefore getting a segfault!

Edit: to better answer your questions:

Q(1): animal is a lv1 pointer, therefore &animal is a lv2 pointer, the same as the char** animal_ptr you're assinging it to.

Q(2): There's a big difference between changing the value of a pointer variable and directly modifying the value of the location it's pointing to. Even if you may be able to get similar results with any of them, depending on your code.

Q(3): "dog" is an array of 4 chars ( 'd','o','g','\\0' ), and in case 1 you're trying to replace it by an array of 5, "bird" . In case 2, animal_ptr+1 is pointing to God knows where so a segfault is expected.

Q(4): If you understood my lousy levels explanation, you should understand now that *animal is a char (The first letter of the animal string), not a char pointer (string), and therefore can't be printed with %s format.

In your first case you have declared an array of pointers that has a length of 2. Your assignment **animal_ptr = animals in this first case does not require &animals because in C a variable pointing to an array is a pointer.

Your second case is not fine at all. When you make the assignment **animal_ptr = &animal you can effectively view **animal_ptr as a pointer to an array of pointers with a length of 1. Consequently your assignment *(animal_ptr+1) = "dog" attempts to assign a pointer to the array "dog" to the non-existing second position in that array. This may appear to succeed (ie your program may not have crashed), but you have actually corrupted memory with this operation by making an assignment to a location in memory that has not been allocated.

In your Q3 you are compounding the error in case 2 by now attempting to copy the array "bird" into the memory pointed to by animal_ptr+1. As we already know animal_ptr+1 is an unallocated location and is unsafe to use. But in addition the location it is pointing to (due to your faulty assignment) is to another string literal.

As for Q4. Nothing is safe to use after you have corrupted memory.

//case 1
char *animals[2]={"cat","dog"};
char **animal_ptr=animals; 
printf("%s\n",*(animal_ptr+1));      //fine
printf("%s\n",  animal_ptr[1]));     //equivalent
printf("%d\n", &animals == animals); // prints 1 i.e. true


//case 2
char *animal="cat";
char **animal_ptr=&animal;
 *(animal_ptr+1)="dog";
printf("%s\n",*(animal_ptr+1)); // fine
printf("%s\n", animal);         // prints cat

Q(1)

Case 1:

  • animals is a an 2-element array of pointers to char types
  • in particular animals itself is a pointer (to the first element which is a pointer to char )
  • animal_ptr is a pointer to a pointer to a char type
  • animals of type char* is assigned to *animal_ptr of type char*
  • note animals holds the same address as &animals - they hold the same value, but are of different types (see here for explanation)

Case 2:

  • animal is a pointer to char type (an array of chars is initialized with "cat" )
  • in particular animal is a pointer (to the first element, which is the character 'c' )
  • animal_ptr is a pointer to a pointer to a char type
  • *animal_ptr of type char* is assigned the address of the pointer animal , which is of type char* ( read again : animal points to a char , &animal points to a char* )
  • the assignment is not clear to me really, you are basically defining a new string "dog"
  • note you still can print "cat"

Q(2)

I cannot see where you modify the strings :-(

  • *animal_ptr points to the cat string
  • *animal_ptr+1 holds another address and points to the "dog" string

Print them out, you'll see.

Q(3)

I think you did not reassign, but define a new string from that address. The string is of length 3 and you want to copy a length 4 string. Thus it fails.

Q(4)

Change printf("%s", *animal) to printf("%c", *animal) , because you print a character then, not a string.

A type char*[] can decay to char** in such an assignment so you do not need the address-of operator there. In the second case you just have a char* where you need to take the address of it to get a char**.

The literals are not placed into some read-only section if they are not const. In this case they are probably created on the stack each time the function is called.

If something is working it does not mean that the code is correct. In your second case you have not allocated a second char* after the first, so increasing animal_ptr and dereferencing it is not legal and has undefined behaviour. Anything you do after that may work or crash or do whatever.

  1. In C, array can be thought of as a non-assignable pointer. The conversion from array to pointer is automatic. So you can always write this

     int array[5]; int *ptr = array; 

What you see is a pointer to pointer. It stores address of address to the variable. Nothing stops us form doing this

    int **array[5];
    int ***ptr = array; 

and going even deeper in pointers to pointers.

  1. There is no const in variables declarations so the data is writable.

  2. and 4. Second code snippet is pretty messed. It corrupts memory as others said. What you do is take address of pointer pointing to the string begging, move one pointer and save in this unprepared memory pointer to "dog" string. And as others said, if memory is corrupted nothing can be sure, even proper program execution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM