简体   繁体   English

c char指针问题

[英]c char pointer problem

if we declare char * p="hello"; 如果我们声明char * p="hello"; then since it is written in data section we cannot modify the contents to which p points but we can modify the pointer itself. 那么因为它是在数据部分写的,所以我们不能修改p点的内容,但我们可以修改指针本身。 but i found this example in C Traps and Pitfalls Andrew Koenig AT&T Bell Laboratories Murray Hill, New Jersey 07974 但我在C陷阱和陷阱中找到了这个例子Andrew Koenig AT&T贝尔实验室Murray Hill,新泽西州07974

the example is 这个例子是

char *p, *q;
p = "xyz";
q = p;
q[1] = ’Y’;

q would point to memory containing the string xYz. q将指向包含字符串xYz的内存。 So would p, because p and q point to the same memory. p也是如此,因为p和q指向相同的内存。

how is it true if the first statement i mentioned is also true.. similarly i ran the following code 如果我提到的第一个语句也是真的,那怎么回事呢?同样我运行了下面的代码

main()
{
char *p="hai friends",*p1;
p1=p;
while(*p!='\0') ++*p++;
printf("%s %s",p,p1);
}

and got the output as ibj!gsjfoet 并获得输出为ibj!gsjfoet

please explain how in both these cases we are able to modify contents? 请解释在这两种情况下我们如何修改内容? thanks in advance 提前致谢

Your same example causes a segmentation fault on my system. 您的相同示例会导致系统出现分段错误。

You're running into undefined behavior here. 你在这里遇到了未定义的行为。 .data (note that the string literal might be in .text too) is not necessarily immutable - there is no guarantee that the machine will write protect that memory (via page tables), depending on the operating system and compiler. .data (注意字符串文字也可能在.text )不一定是不可变的 - 不能保证机器会写保护该内存(通过页表),具体取决于操作系统和编译器。

Only your OS can guarantee that stuff in the data section is read-only, and even that involves setting segment limits and access flags and using far pointers and such, so it's not always done. 只有您的操作系统可以保证数据部分中的内容是只读的,甚至包括设置段限制和访问标志以及使用远指针等,因此并不总是这样做。

C itself has no such limitation; C本身没有这样的限制; in a flat memory model (which almost all 32-bit OSes use these days), any bytes in your address space are potentially writable, even stuff in your code section. 在平坦的内存模型中(几乎所有32位操作系统都使用这些天),地址空间中的任何字节都可能是可写的,甚至是代码部分中的内容。 If you had a pointer to main(), and some knowledge of machine language, and an OS that had stuff set up just right (or rather, failed to prevent it), you could potentially rewrite it to just return 0. Note that this is all black magic of a sort, and is rarely done intentionally, but it's part of what makes C such a powerful language for systems programming. 如果你有一个指向main()的指针,以及一些机器语言的知识,以及操作系统设置得恰到好处(或者更确切地说,无法阻止它),你可能会重写它只返回0.注意这个是一种黑魔法,并且很少有意识地完成,但它是使C成为系统编程的强大语言的一部分。

Even if you can do this and it seems that there are no errors, it's a bad idea. 即使你可以做到这一点似乎没有错误,这是一个坏主意。 Depending on the program in question, you could end up making it very easy for buffer overflow attacks. 根据所讨论的程序,最终可能会使缓冲区溢出攻击变得非常容易。 A good article explaining this is: 一篇解释这个的好文章是:

https://www.securecoding.cert.org/confluence/display/seccode/STR30-C.+Do+not+attempt+to+modify+string+literals https://www.securecoding.cert.org/confluence/display/seccode/STR30-C.+Do+not+attempt+to+modify+string+literals

main()
{
int i = 0;
char *p= "hai friends", *p1;
p1 = p;
while(*(p + i) != '\0')
    {
        *(p + i);
        i++;
    }
printf("%s %s", p, p1);
return 0;
}

This code will give output: hai friends hai friends 这段代码会给出输出:hai friends hai friends

It'll depend on the compiler as to whether that works or not. 它取决于编译器是否有效。

x86 is a von Neumann architecture (as opposed to Harvard ), so there's no clear difference between the 'data' and 'program' memory at the basic level (ie the compiler isn't forced into having different types for program vs data memory, and so won't necessarily restrict any variable to one or the other). x86是冯·诺依曼架构 (与哈佛相反),因此基本级别的“数据”和“程序”内存之间没有明显的区别(即编译器不会强制为程序与数据存储器使用不同的类型,所以不一定会将任何变量限制在一个或另一个)。

So one compiler may allow modification of the string while another does not. 因此,一个编译器可能允许修改字符串而另一个编译器不允许。

My guess is that a more lenient compiler (eg cl, the MS Visual Studio C++ compiler) would allow this, while a more strict compiler (eg gcc) would not. 我的猜测是,一个更宽松的编译器(例如cl,MS Visual Studio C ++编译器)将允许这一点,而更严格的编译器(例如gcc)则不会。 If your compiler allows it, chances are it's effectively changing your code to something like: 如果您的编译器允许它,那么它可能会有效地将您的代码更改为:

...
char p[] = "hai friends";
char *p1 = p;
...
// (some disassembly required to really see what it's done though)

perhaps with the 'good intention' of allowing new C/C++ coders to code with less restriction / fewer confusing errors. 也许是出于“好意”允许新的C / C ++编码器以较少的限制/较少的混淆错误进行编码。 (whether this is a 'Good Thing' is up to much debate and I will keep my opinions mostly out of this post :P) (这是一个'好事'是否有很多争论,我将主要从这篇文章中保留我的观点:P)

Out of interest, what compiler did you use? 出于兴趣,你使用了什么编译器?

In olden days, when C as described by K & R in their book "The C Programming Language" was the ahem "standard", what you describe was perfectly OK. 在过去,当K&R在他们的书“The C Programming Language”中描述的C是讽刺“标准”时,你所描述的完全没问题。 In fact, some compilers jumped through hoops to make string literals writable. 事实上,一些编译器跳过了箍,使字符串文字可写。 They'd laboriously copy the strings from the text segment to the data segment on initialisation. 他们在初始化时费力地将字符串从文本段复制到数据段。

Even now, gcc has a flag to restore this behaviour: -fwritable-strings . 即使是现在,gcc也有一个标志来恢复这种行为: -fwritable-strings

Modifying string literals is a bad idea, but that doesn't mean it might not work. 修改字符串文字是一个坏主意,但这并不意味着它可能无法正常工作。

One really good reason not to: your compiler is allowed to take multiple instances of the same string literal and make them point to the same block of memory. 一个非常好的理由不允许:允许您的编译器获取相同字符串文字的多个实例,并使它们指向同一块内存。 So if "xyz" was defined somewhere else in your code, you could inadvertently break other code that was expecting it to be constant. 因此,如果在代码中的其他位置定义了“xyz”,则可能会无意中破坏期望它保持不变的其他代码。

Your program also works on my system(windows+cygwin). 您的程序也适用于我的系统(windows + cygwin)。 However the standard says you shouldn't do that though the consequence is not defined. 但是标准说你不应该这样做,尽管结果没有定义。

Following excerpt from the book C: A Reference Manual 5/E, page 33, 以下摘录自C:A参考手册5 / E,第33页,

You should never attempt to modify the memory that holds the characters of a string constant since may be read-only 您永远不应该尝试修改包含字符串字符常量的内存,因为它可能是只读的

char p1[] = "Always writable";
char *p2 = "Possibly not writable";
const char p3[] = "Never writable";

p1 line will always work; p1线将始终有效; p2 line may work or may cause a run-time error ; p2行可能有效或可能导致运行时错误 ; p3 will always cause a compile-time error. p3将始终导致编译时错误。

While modifying a string literal may be possible on your system, that's a quirk of your platform, rather than a guarantee of the language. 虽然在您的系统上修改字符串文字是可能的,但这是您平台的一个怪癖,而不是语言的保证。 The actual C language doesn't know anything about .data sections, or .text sections. 实际的C语言对.data部分或.text部分一无所知。 That's all implementation detail. 这就是所有实施细节。

On some embedded systems, you won't even have a filesystem to contain a file with a .text section. 在某些嵌入式系统上,您甚至没有文件系统来包含带有.text部分的文件。 On some such systems, your string literals will be stored in ROM, and trying to write to the ROM will just crash the device. 在某些此类系统上,您的字符串文字将存储在ROM中,并且尝试写入ROM只会使设备崩溃。

If you write code that depends on undefined behavior, and only works on your platform, you can be guaranteed that sooner or later, somebody will think it is a good idea to port it to some new device that doesn't work the way you expected. 如果您编写的代码依赖于未定义的行为,并且只能在您的平台上运行,那么您可以保证迟早会有人认为将它移植到一些无法按预期工作的新设备上是个好主意。 。 When that happens, an angry pack of embedded developers will hunt you down and stab you. 当发生这种情况时,一群愤怒的嵌入式开发人员会追捕你并刺伤你。

p is effectively pointing to read only memory. p实际上指向只读内存。 The result of assigning to the array p points to is probably undefined behavior. 分配给数组p指向的结果可能是未定义的行为。 Just because the compiler lets you get away with it doesn't mean it's OK. 仅仅因为编译器让你逃脱它并不意味着它没关系。

Take a look at this question from the C-FAQ: comp.lang.c FAQ list · Question 1.32 从C-FAQ中查看这个问题: comp.lang.c FAQ列表·问题1.32

Q: What is the difference between these initializations? 问:这些初始化之间有什么区别?

char a[] = "string literal";
char *p  = "string literal";

My program crashes if I try to assign a new value to p[i]. 如果我尝试为p [i]分配新值,我的程序会崩溃。

A: A string literal (the formal term for a double-quoted string in C source) can be used in two slightly different ways: 答:字符串文字(C源代码中双引号字符串的正式术语)可以两种略有不同的方式使用:

  1. As the initializer for an array of char, as in the declaration of char a[] , it specifies the initial values of the characters in that array (and, if necessary, its size). 作为char数组的初始化器,如char a []的声明,它指定该数组中字符的初始值(如果需要,还指定其大小)。
  2. Anywhere else, it turns into an unnamed, static array of characters, and this unnamed array may be stored in read-only memory, and which therefore cannot necessarily be modified. 在其他任何地方,它变成一个未命名的静态字符数组,这个未命名的数组可能存储在只读存储器中,因此不一定能被修改。 In an expression context, the array is converted at once to a pointer, as usual (see section 6), so the second declaration initializes p to point to the unnamed array's first element. 在表达式上下文中,像往常一样将数组一次转换为指针(参见第6节),因此第二个声明将p初始化为指向未命名数组的第一个元素。

Some compilers have a switch controlling whether string literals are writable or not (for compiling old code), and some may have options to cause string literals to be formally treated as arrays of const char (for better error catching). 有些编译器有一个开关控制字符串文字是否可写(用于编译旧代码),有些编译器可能有选项可以将字符串文字正式地视为const char数组(以便更好地捕获错误)。

I think you are making a big confusion on a very important general concept to understand when using C, C++ or other low-level languages. 我认为你在使用C,C ++或其他低级语言时理解一个非常重要的一般概念时会产生很大的困惑。 In a low-level language there is an implicit assumption than the programmer knows what s/he is doing and makes no programming error . 在低级语言中,有一个隐含的假设,而不是程序员知道他/她在做什么并且没有编程错误

This assumption allows the implementers of the language to just ignore what should happen if the programmer is violating the rules. 这种假设允许语言的实现者忽略如果程序员违反规则会发生什么。 The end effect is that in C or C++ there is no "runtime error" guarantee... if you do something bad simply it's NOT DEFINED ("undefined behaviour" is the legalese term) what is going to happen. 最终结果是在C或C ++中没有“运行时错误”保证......如果你做了坏事只是它没有定义 (“未定义的行为”是法律术语)会发生什么。 May be a crash (if you're very lucky), or may be just apparently nothing (unfortunately most of the times... with may be a crash in a perfectly valid place one million executed instructions later). 可能是一次崩溃(如果你非常幸运的话),或者可能只是显然没什么(不幸的是大多数时候......可能是一个完全有效的地方崩溃,后来有一百万个执行指令)。

For example if you access outside of an array MAY BE you will get a crash, may be not, may even be a daemon will come out of your nose (this is the "nasal daemon" you may find on the internet). 例如,如果你在一个数组之外访问可能是你会崩溃,可能不是,甚至可能是一个守护进程会从你的鼻子出来(这是你可能在互联网上找到的“鼻守护进程”)。 It's just not something that who wrote the compiler took care thinking to. 编写编译器的人并没有考虑到这一点。

Just never do that (if you care about writing decent programs). 永远不要那样做(如果你关心写出体面的程序)。

An additional burden on who uses low level languages is that you must learn all the rules very well and you must never violate them. 使用低级语言的人的另一个负担是,你必须非常好地学习所有规则,并且绝不能违反它们。 If you violate a rule you cannot expect a "runtime error angel" to help you... only "undefined behaviour daemons" are present down there. 如果违反规则,您不能指望“运行时错误天使”来帮助您......只有“未定义的行为守护程序”才会出现在那里。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM