简体   繁体   English

strcat用char指向字符串文字

[英]strcat with char pointer to a string literal

Was just trying to understand the below code asked in a recent interview. 只是想在最近的一次采访中了解下面的代码。

#include <stdio.h>
#include <string.h>

int main() {
    char *ptr = "Linux";
    char a[] = "Solaris";
    strcat(a, ptr);
    printf("%s\n", ptr);
    printf("%s\n", a);
    return 0;
}

Execution trace: 执行追踪:

gcc -Wall -g prog.c
gdb a.out

(gdb) p ptr
$15 = 0x400624 "Linux"
(gdb) p a+1
$20 = 0x7fffffffe7f1 "olarisLinux"
**(gdb) p a
$21 = "SolarisL"**
**(gdb) p a+0
$22 = 0x7fffffffe7f0 "SolarisLinux"**
(gdb)
$23 = 0x7fffffffe7f0 "SolarisLinux"
**(gdb) p ptr
$24 = 0x78756e69 <error: Cannot access memory at address 0x78756e69>
(gdb)**

I have a few questions: 我有几个问题:

  1. Does strcat remove the string literal from the original location, as accessing ptr gives a segmentation fault? strcat是否从原始位置删除字符串文字,因为访问ptr会产生分段错误?

  2. Why does pa in gdb doesn't give the proper output where as p a+0 shows "SolarisLinux" ? 为什么pa在gdb中没有给出正确的输出,而p a+0显示"SolarisLinux"

If I understand your question correct you are aware that the program has undefined behavior due to a not being able to hold the string "Solaris" concatenated with "Linux". 如果我理解你的问题正确的,你都知道,该方案已未定义的行为是由于a不能够保持字符串中的“Solaris”与“Linux的”连接在一起。

So the answer your looking for is not "This is undefined behavior" but rather: 所以你要找的答案不是“这是未定义的行为”,而是:

why is it behaving this way 为什么这样做呢?

When dealing with Undefined behavior we can't give a general explanation of what's going on. 在处理未定义的行为时,我们无法对正在发生的事情做出一般性解释。 It may do different things on different systems or different things for different compilers (or compiler versions) and so on. 对于不同的编译器(或编译器版本)等,它可能在不同的系统上执行不同的操作或执行不同的操作。

Therefore it's often said that it makes no sense to try to explain what is going on in a program with undefined behavior. 因此,人们常说,尝试解释具有未定义行为的程序中发生的事情是没有意义的。 And well - that's correct. 好吧 - 那是对的。

However - sometimes you can find an explanation for your specific system - just remember that it is specific for your system and in no way universal. 但是 - 有时你可以找到适合你的特定系统的解释 - 只要记住它是特定于你的系统,绝不是通用的。

So I changed your code to add some debug print: 所以我更改了你的代码以添加一些调试打印:

#include<stdio.h>
#include<string.h>

int main()
{
    char *ptr = "Linux";
    char a[] = "Solaris";
    printf("   a = %p\n", (void*)a);
    printf("&ptr = %p\n", (void*)&ptr);
    printf(" ptr = %p\n", (void*)ptr);

    // Print the data that ptr holds
    unsigned char* p = (unsigned char*)&ptr;

    printf("\nBefore strcat\n");
    printf("  a:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(a+i));
    printf("\n");

    printf("  ptr:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(p+i));
    printf("\n");

    strcat(a,ptr);

    printf("\nAfter strcat\n");
    printf("  a:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(a+i));
    printf("\n");

    printf("  ptr:\n");
    for (int i = 0; i < 8; ++i) printf("%02x ", *(p+i));
    printf("\n\n");

    printf("%s\n", a);

    printf("%s\n", ptr);

    return 0;
}

On my system this generates: 在我的系统上,这会生成:

   a = 0x7ffff3ce5050
&ptr = 0x7ffff3ce5058
 ptr = 0x400820

Before strcat
  a:
53 6f 6c 61 72 69 73 00
  ptr:
20 08 40 00 00 00 00 00

After strcat
  a:
53 6f 6c 61 72 69 73 4c
  ptr:
69 6e 75 78 00 00 00 00

SolarisLinux
Segmentation fault

Here the output is with some comments added: 这里的输出是添加了一些注释:

   a = 0x7ffff3ce5050   // The address where the array a istored
&ptr = 0x7ffff3ce5058   // The address where ptr is stored. Notice 8 higher than a
 ptr = 0x400820         // The value of ptr

Before strcat
  a:
53 6f 6c 61 72 69 73 00 // Hex dump of a gives Solaris\0
  ptr:
20 08 40 00 00 00 00 00 // Hex dump of ptr is the value 0x0000000000400820 (little endian system)

// Here strcat is executed

After strcat
  a:
53 6f 6c 61 72 69 73 4c // Hex dump of a gives SolarisL
  ptr:
69 6e 75 78 00 00 00 00 // Ups.. ptr has changed! It's not a valid pointer value anymore
                        // As a string it is inux\0

SolarisLinux            // print a
Segmentation fault      // print ptr crashes because ptr doesn't hold a valid pointer value

So on my system the explanation is: 所以在我的系统上解释是:

a is located in memory just before ptr so when strcat write out of bounds of a it actually overwrites the value of ptr . a位于内存之前ptr所以当strcat编写出界的a实际覆盖的值ptr Consequently the program crashes when trying to use ptr as a valid pointer. 因此,当尝试使用ptr作为有效指针时程序崩溃。

So for your specific questions: 所以针对您的具体问题:

1)Does strcat removes the string literal from the original location, as accessing ptr gives a segmentation fault. 1)strcat是否从原始位置删除字符串文字,因为访问ptr会产生分段错误。

No. It's the value of ptr that has been overwritten. 不,这是被覆盖的ptr的价值。 The sring literal are most likely untouched sring字面很可能不受影响

2)Why does pa in gdb doesn't give the proper o/p where as p a+0 shows "SolarisLinux". 2)为什么gdb中的pa没有给出正确的o / p,而p a + 0显示“SolarisLinux”。

This is a guess - nothing more. 这是猜测 - 仅此而已。 My guess is that gdb knows that a is 8 bytes so printing a directly only prints 8 bytes. 我的猜测是,GDB知道, a是8个字节,因此打印a直接打印仅8个字节。 When printing a + 0 mys guess is that gdb sees a + 0 like a pointer (and therefore can't know the object size) so gdb keeps printing until it sees a zero-termination. 当打印a + 0 mys猜测是gdb看到a + 0像指针(因此无法知道对象大小)所以gdb保持打印直到它看到零终止。

If the question is "I know it's wrong, but why did it do that ?", there are sort of two ways of answering it. 如果问题是“我知道这是错的,但为什么会这样 ?”,也有几分回答这两种方式。

(1) Undefined behavior means anything can happen. (1)未定义的行为意味着任何事情都可能发生。 Taking an array of size 8 and writing 13 characters to it is a really wrong thing to do. 采用大小为8的数组并将13个字符写入其中是一件非常错误的事情。 You're overwriting five bytes of memory that were presumably in use for something else, so overwriting them means... anything can happen. 你覆盖了可能用于其他东西的五个字节的内存,所以覆盖它们意味着......任何事情都可能发生。 (But now I'm repeating myself.) (但现在我在重复自己。)

I know you asked the question in all sincerity, but I have to say, to me these questions always sound like: "I ran through a busy intersection when the sign said Don't Walk. A blue car ran over me, and I broke my left leg. I don't understand why. Why wasn't I hit by a red truck? Why didn't I break my right arm?" 我知道你真诚地问了这个问题,但我不得不说,对我来说,这些问题听起来总是这样:“当标志说不要走路时,我跑过一个繁忙的十字路口。一辆蓝色的车跑过我,我打破了我的左腿。我不明白为什么。为什么我不是被一辆红色卡车撞了?为什么我没有摔断我的右臂?

(2) Let's look at a likely layout of the memory allocated for this program: (2)让我们看一下为该程序分配的内存的可能布局:

            +----+----+----+----+----+----+----+----+
         a: | S  | o  | l  | a  | r  | i  | s  | \0 |
            +----+----+----+----+----+----+----+----+

            +----+----+----+----+
       ptr: | 78 | 56 | 34 | 12 |
            +----+----+----+----+

            +----+----+----+----+----+----+
0x12345678: | L  | i  | n  | u  | x  | \0 |
            +----+----+----+----+----+----+

Here I'm imagining that the string "Linux" is stored at address 0x12345678 , so ptr holds that value. 在这里我想象字符串"Linux"存储在地址0x12345678 ,所以ptr保存该值。 I'm imagining that your machine uses 32-bit pointers. 我想象你的机器使用32位指针。 (These days, though, it might well use 64.) I'm imagining that your machine uses "little endian" byte order, meaning that the bytes making up the pointer p are stored in the opposite order in memory than you might expect. (现在,它可能会使用64.)我想你的机器使用“小端”字节顺序,这意味着组成指针p的字节以与你预期的相反的顺序存储在内存中。

You said that after calling strcat , a printed out the concatenated string you expected, but the program crashed when you tried to print ptr . 你说,打完电话后strcata打印出你所期望的连接字符串,但是当你试图打印程序崩溃ptr Let's change the printout of ptr to 让我们将ptr的打印输出更改为

printf("%p: %s\n", ptr, ptr);

Before the call to strcat , this will print something like 在调用strcat之前,这将打印出类似的内容

0x12345678: Linux

But here's what the call to strcat actually does: 但这是对strcat实际执行的操作:

            +----+----+----+----+----+----+----+----+
         a: | S  | o  | l  | a  | r  | i  | s  | L  |
            +----+----+----+----+----+----+----+----+

            +----+----+----+----+
       ptr: | i  | n  | u  | x  | \0
            +----+----+----+----+

Now, the printout of ptr is going to be something like 现在, ptr的打印输出将会是这样的

0x78756e69: Segmentation violation (core dumped)

You overwrote the pointer ptr , so it no longer points to address 0x12345678 where the string "Linux" is stored, it now points to location 0x78756e69 , where those hex digits come from the characters inux . 你覆盖指针ptr ,所以它不再指向存储字符串"Linux"地址0x12345678 ,它现在指向位置0x78756e69 ,其中这些十六进制数字来自字符inux If you don't have permission to access address 0x78756e69 , you'll get a crash. 如果您无权访问地址0x78756e69 ,则会发生崩溃。 If you do have permission to access location 0x78756e69 , you'll get some garbage string printed. 如果您确实有权访问位置0x78756e69 ,则会打印一些垃圾字符串。

Now, with all of that said, it's important to note that this is not necessarily what will happen. 现在,尽管如此,重要的是要注意这不一定会发生什么。 I've assumed that the compiler stored the pointer ptr right after the array a in memory. 我假设编译器将指针ptr存储在内存中的数组a之后。 That's one possibility, but obviously not the only possibility. 这是一种可能性,但显然不是唯一的可能性。 If the compiler happened to store ptr somewhere else, then something else would get overwritten by inux , and something else might go wrong. 如果编译器碰巧将ptr存储在其他地方,那么其他东西会被inux覆盖,而其他东西可能会出错。 Or nothing might go wrong. 或者没有什么可能出错。 (In other words, you might get hit by the blue car, or you might get hit by the red truck, or you might get lucky and make it across the street without being hit at all.) (换句话说,你可能会受到蓝色汽车的撞击,或者你可能会受到红色卡车的撞击,或者你可能会幸运地穿过街道而不会受到任何打击。)


Addendum: I've just looked at your post more carefully, and I see that gdb told you that ptr had changed to 0x78756e69 , and that it couldn't access the memory there. 附录:我刚刚仔细查看了你的帖子,我看到gdb告诉你ptr已经改为0x78756e69 ,并且它无法访问那里的内存。 But now we know where that strange value 0x78756e69 probably came from. 但现在我们知道0x78756e69可能来自哪个奇怪的值。 :-) :-)

Well, here we've got a pointer mistake. 好吧,这里我们有一个指针错误。

I'll try to be understandable : 我会试着理解:

Constant strings (like "Linux" and "Solaris" ) are stored in a specific memory area of the program. 常量字符串(如"Linux""Solaris" )存储在程序的特定存储区中。 For your program, among other strings (like error message for instance), there should be an area with : "Linux\\0Solaris\\0%s\\n\\0%s\\n\\0" . 对于您的程序,以及其他字符串(例如错误消息),应该有一个区域: "Linux\\0Solaris\\0%s\\n\\0%s\\n\\0"

When you do : 当你这样做时:

char *ptr = "Linux";
char a[] = "Solaris";

you assign ptr to the address of the 'L' char and you are given 8 * sizeof(char) memory on the stack where "Solaris\\0" is then copied. 将ptr分配给'L'字符的地址,然后在堆栈上给出8 * sizeof(char)内存,然后复制"Solaris\\0"

When you concat those two strings, since you never created a new memory space (doing malloc or char str[50] for instance), ask strcat to write after the end of the stack memory reserved for your function usage. 当您连接这两个字符串时,由于您从未创建过新的内存空间(例如,执行mallocchar str[50] ),请在为函数使用保留的堆栈内存结束后请求strcat写入。 This is the kind of programming mistake that causes stack overflow . 这是导致堆栈溢出的编程错误。

Here gdb tries it's best to display the strings. 这里gdb尝试最好显示字符串。

(gdb) p ptr
$15 = 0x400624 "Linux"

Pointer to static string area, display correctly 指向静态字符串区域的指针,正确显示

(gdb) p a+1
$20 = 0x7fffffffe7f1 "olarisLinux"

Pointer to stack displayed as you would expect 堆栈指针显示为您所期望的

(gdb) p a
$21 = "SolarisL"

Pointer to a 8 char len area, gdb knows the size, displays you the 8 first char. 指向8个char len区域的指针,gdb知道大小,显示8个第一个char。

(gdb) p a+0
$22 = 0x7fffffffe7f0 "SolarisLinux"

Pointer to stack (gdb doesn't know the size since you do pointer arithmetic) 指向堆栈的指针(因为你做指针算术,gdb不知道大小)

(gdb) p ptr
$24 = 0x78756e69 <error: Cannot access memory at address 0x78756e69>

This one is tricky. 这个很棘手。 See here, ptr does not have the same address as the first time you printed it. 看到这里,ptr与第一次打印时的地址不同。 There is a possibility that you wrote over ptr value at some point (as you wrote somewhere on the stack that you shouldn't have). 你有可能在某个时候写过ptr值(因为你在堆栈上写了一些你不应该有的东西)。

1)Does strcat removes the string literal from the original location, as accessing ptr gives a segmentation fault. 1)strcat是否从原始位置删除字符串文字,因为访问ptr会产生分段错误。

Nope, the original location can't be overwritten. 不,原来的位置不能被覆盖。

2)Why does pa in gdb doesn't give the proper o/p where as p a+0 shows "SolarisLinux". 2)为什么gdb中的pa没有给出正确的o / p,而p a + 0显示“SolarisLinux”。

It's a debugger, it is written to avoid some type of error, so when it can, he reads only what should be red. 它是一个调试器,编写它是为了避免某种类型的错误,所以当它可以时,他只读取应该是红色的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM