简体   繁体   中英

strcat with char pointer to a string literal

Was just trying to understand the below code asked in a recent interview.

#include <stdio.h>
#include <string.h>

int main() {
    char *ptr = "Linux";
    char a[] = "Solaris";
    strcat(a, ptr);
    printf("%s\n", ptr);
    printf("%s\n", a);
    return 0;
}

Execution trace:

gcc -Wall -g prog.c
gdb a.out

(gdb) p ptr
$15 = 0x400624 "Linux"
(gdb) p a+1
$20 = 0x7fffffffe7f1 "olarisLinux"
**(gdb) p a
$21 = "SolarisL"**
**(gdb) p a+0
$22 = 0x7fffffffe7f0 "SolarisLinux"**
(gdb)
$23 = 0x7fffffffe7f0 "SolarisLinux"
**(gdb) p ptr
$24 = 0x78756e69 <error: Cannot access memory at address 0x78756e69>
(gdb)**

I have a few questions:

  1. Does strcat remove the string literal from the original location, as accessing ptr gives a segmentation fault?

  2. Why does pa in gdb doesn't give the proper output where as p a+0 shows "SolarisLinux" ?

If I understand your question correct you are aware that the program has undefined behavior due to a not being able to hold the string "Solaris" concatenated with "Linux".

So the answer your looking for is not "This is undefined behavior" but rather:

why is it behaving this way

When dealing with Undefined behavior we can't give a general explanation of what's going on. It may do different things on different systems or different things for different compilers (or compiler versions) and so on.

Therefore it's often said that it makes no sense to try to explain what is going on in a program with undefined behavior. And well - that's correct.

However - sometimes you can find an explanation for your specific system - just remember that it is specific for your system and in no way universal.

So I changed your code to add some debug print:

#include<stdio.h>
#include<string.h>

int main()
{
    char *ptr = "Linux";
    char a[] = "Solaris";
    printf("   a = %p\n", (void*)a);
    printf("&ptr = %p\n", (void*)&ptr);
    printf(" ptr = %p\n", (void*)ptr);

    // Print the data that ptr holds
    unsigned char* p = (unsigned char*)&ptr;

    printf("\nBefore strcat\n");
    printf("  a:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(a+i));
    printf("\n");

    printf("  ptr:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(p+i));
    printf("\n");

    strcat(a,ptr);

    printf("\nAfter strcat\n");
    printf("  a:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(a+i));
    printf("\n");

    printf("  ptr:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(p+i));
    printf("\n\n");

    printf("%s\n", a);

    printf("%s\n", ptr);

    return 0;
}

On my system this generates:

   a = 0x7ffff3ce5050
&ptr = 0x7ffff3ce5058
 ptr = 0x400820

Before strcat
  a:
53 6f 6c 61 72 69 73 00
  ptr:
20 08 40 00 00 00 00 00

After strcat
  a:
53 6f 6c 61 72 69 73 4c
  ptr:
69 6e 75 78 00 00 00 00

SolarisLinux
Segmentation fault

Here the output is with some comments added:

   a = 0x7ffff3ce5050   // The address where the array a istored
&ptr = 0x7ffff3ce5058   // The address where ptr is stored. Notice 8 higher than a
 ptr = 0x400820         // The value of ptr

Before strcat
  a:
53 6f 6c 61 72 69 73 00 // Hex dump of a gives Solaris\0
  ptr:
20 08 40 00 00 00 00 00 // Hex dump of ptr is the value 0x0000000000400820 (little endian system)

// Here strcat is executed

After strcat
  a:
53 6f 6c 61 72 69 73 4c // Hex dump of a gives SolarisL
  ptr:
69 6e 75 78 00 00 00 00 // Ups.. ptr has changed! It's not a valid pointer value anymore
                        // As a string it is inux\0

SolarisLinux            // print a
Segmentation fault      // print ptr crashes because ptr doesn't hold a valid pointer value

So on my system the explanation is:

a is located in memory just before ptr so when strcat write out of bounds of a it actually overwrites the value of ptr . Consequently the program crashes when trying to use ptr as a valid pointer.

So for your specific questions:

1)Does strcat removes the string literal from the original location, as accessing ptr gives a segmentation fault.

No. It's the value of ptr that has been overwritten. The sring literal are most likely untouched

2)Why does pa in gdb doesn't give the proper o/p where as p a+0 shows "SolarisLinux".

This is a guess - nothing more. My guess is that gdb knows that a is 8 bytes so printing a directly only prints 8 bytes. When printing a + 0 mys guess is that gdb sees a + 0 like a pointer (and therefore can't know the object size) so gdb keeps printing until it sees a zero-termination.

If the question is "I know it's wrong, but why did it do that ?", there are sort of two ways of answering it.

(1) Undefined behavior means anything can happen. Taking an array of size 8 and writing 13 characters to it is a really wrong thing to do. You're overwriting five bytes of memory that were presumably in use for something else, so overwriting them means... anything can happen. (But now I'm repeating myself.)

I know you asked the question in all sincerity, but I have to say, to me these questions always sound like: "I ran through a busy intersection when the sign said Don't Walk. A blue car ran over me, and I broke my left leg. I don't understand why. Why wasn't I hit by a red truck? Why didn't I break my right arm?"

(2) Let's look at a likely layout of the memory allocated for this program:

            +----+----+----+----+----+----+----+----+
         a: | S  | o  | l  | a  | r  | i  | s  | \0 |
            +----+----+----+----+----+----+----+----+

            +----+----+----+----+
       ptr: | 78 | 56 | 34 | 12 |
            +----+----+----+----+

            +----+----+----+----+----+----+
0x12345678: | L  | i  | n  | u  | x  | \0 |
            +----+----+----+----+----+----+

Here I'm imagining that the string "Linux" is stored at address 0x12345678 , so ptr holds that value. I'm imagining that your machine uses 32-bit pointers. (These days, though, it might well use 64.) I'm imagining that your machine uses "little endian" byte order, meaning that the bytes making up the pointer p are stored in the opposite order in memory than you might expect.

You said that after calling strcat , a printed out the concatenated string you expected, but the program crashed when you tried to print ptr . Let's change the printout of ptr to

printf("%p: %s\n", ptr, ptr);

Before the call to strcat , this will print something like

0x12345678: Linux

But here's what the call to strcat actually does:

            +----+----+----+----+----+----+----+----+
         a: | S  | o  | l  | a  | r  | i  | s  | L  |
            +----+----+----+----+----+----+----+----+

            +----+----+----+----+
       ptr: | i  | n  | u  | x  | \0
            +----+----+----+----+

Now, the printout of ptr is going to be something like

0x78756e69: Segmentation violation (core dumped)

You overwrote the pointer ptr , so it no longer points to address 0x12345678 where the string "Linux" is stored, it now points to location 0x78756e69 , where those hex digits come from the characters inux . If you don't have permission to access address 0x78756e69 , you'll get a crash. If you do have permission to access location 0x78756e69 , you'll get some garbage string printed.

Now, with all of that said, it's important to note that this is not necessarily what will happen. I've assumed that the compiler stored the pointer ptr right after the array a in memory. That's one possibility, but obviously not the only possibility. If the compiler happened to store ptr somewhere else, then something else would get overwritten by inux , and something else might go wrong. Or nothing might go wrong. (In other words, you might get hit by the blue car, or you might get hit by the red truck, or you might get lucky and make it across the street without being hit at all.)


Addendum: I've just looked at your post more carefully, and I see that gdb told you that ptr had changed to 0x78756e69 , and that it couldn't access the memory there. But now we know where that strange value 0x78756e69 probably came from. :-)

Well, here we've got a pointer mistake.

I'll try to be understandable :

Constant strings (like "Linux" and "Solaris" ) are stored in a specific memory area of the program. For your program, among other strings (like error message for instance), there should be an area with : "Linux\\0Solaris\\0%s\\n\\0%s\\n\\0" .

When you do :

char *ptr = "Linux";
char a[] = "Solaris";

you assign ptr to the address of the 'L' char and you are given 8 * sizeof(char) memory on the stack where "Solaris\\0" is then copied.

When you concat those two strings, since you never created a new memory space (doing malloc or char str[50] for instance), ask strcat to write after the end of the stack memory reserved for your function usage. This is the kind of programming mistake that causes stack overflow .

Here gdb tries it's best to display the strings.

(gdb) p ptr
$15 = 0x400624 "Linux"

Pointer to static string area, display correctly

(gdb) p a+1
$20 = 0x7fffffffe7f1 "olarisLinux"

Pointer to stack displayed as you would expect

(gdb) p a
$21 = "SolarisL"

Pointer to a 8 char len area, gdb knows the size, displays you the 8 first char.

(gdb) p a+0
$22 = 0x7fffffffe7f0 "SolarisLinux"

Pointer to stack (gdb doesn't know the size since you do pointer arithmetic)

(gdb) p ptr
$24 = 0x78756e69 <error: Cannot access memory at address 0x78756e69>

This one is tricky. See here, ptr does not have the same address as the first time you printed it. There is a possibility that you wrote over ptr value at some point (as you wrote somewhere on the stack that you shouldn't have).

1)Does strcat removes the string literal from the original location, as accessing ptr gives a segmentation fault.

Nope, the original location can't be overwritten.

2)Why does pa in gdb doesn't give the proper o/p where as p a+0 shows "SolarisLinux".

It's a debugger, it is written to avoid some type of error, so when it can, he reads only what should be red.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM