[英]C strings confusion
I'm learning C right now and got a bit confused with character arrays - strings. 我现在正在学习C并且对字符数组 - 字符串感到困惑。
char name[15]="Fortran";
No problem with this - its an array that can hold (up to?) 15 chars 没问题 - 它的数组可以容纳(最多?)15个字符
char name[]="Fortran";
C counts the number of characters for me so I don't have to - neat! C计算我的字符数,所以我没有 - 整洁!
char* name;
Okay. 好的。 What now? 现在怎么办? All I know is that this can hold an big number of characters that are assigned later (eg: via user input), but 我所知道的是,这可以容纳后来分配的大量字符(例如:通过用户输入),但是
thanks in advance, lamas 提前谢谢,喇嘛
I think this can be explained this way, since a picture is worth a thousand words... 我认为这可以用这种方式解释,因为一张图片胜过千言万语......
We'll start off with char name[] = "Fortran"
, which is an array of chars, the length is known at compile time, 7 to be exact, right? 我们将从char name[] = "Fortran"
,这是一个字符数组,长度在编译时已知,确切地说是7,对吧? Wrong! 错误! it is 8, since a '\\0' is a nul terminating character, all strings have to have that. 它是8,因为'\\ 0'是一个空终止字符,所有字符串都必须有。
char name[] = "Fortran"; +======+ +-+-+-+-+-+-+-+--+ |0x1234| |F|o|r|t|r|a|n|\0| +======+ +-+-+-+-+-+-+-+--+
At link time, the compiler and linker gave the symbol name
a memory address of 0x1234. 在链接时,编译器和链接器为符号name
提供了0x1234的内存地址。 Using the subscript operator, ie name[1]
for example, the compiler knows how to calculate where in memory is the character at offset, 0x1234 + 1 = 0x1235, and it is indeed 'o'. 使用下标运算符,例如name[1]
,编译器知道如何计算内存中偏移处的字符,0x1234 + 1 = 0x1235,并且它确实是'o'。 That is simple enough, furthermore, with the ANSI C standard, the size of a char
data type is 1 byte, which can explain how the runtime can obtain the value of this semantic name[cnt++]
, assuming cnt
is an int
eger and has a value of 3 for example, the runtime steps up by one automatically, and counting from zero, the value of the offset is 't'. 这是很简单的,此外,与ANSI C标准,一个大小char
数据类型是1个字节,其可以解释运行时可以如何获得该语义值name[cnt++]
假设cnt
是int
埃格尔并具有例如,值为3,运行时自动向上逐步递增,从零开始计数,偏移量的值为“t”。 This is simple so far so good. 到目前为止这很简单。
What happens if name[12]
was executed? 如果name[12]
被执行会怎么样? Well, the code will either crash, or you will get garbage, since the boundary of the array is from index/offset 0 (0x1234) up to 8 (0x123B). 好吧,代码会崩溃,或者你会得到垃圾,因为数组的边界是从索引/偏移0(0x1234)到8(0x123B)。 Anything after that does not belong to name
variable, that would be called a buffer overflow! 之后的任何东西都不属于name
变量,这将被称为缓冲区溢出!
The address of name
in memory is 0x1234, as in the example, if you were to do this: 内存中的name
地址为0x1234,如示例所示,如果您这样做:
printf("The address of name is %p\n", &name); Output would be: The address of name is 0x00001234
For the sake of brevity and keeping with the example, the memory addresses are 32bit, hence you see the extra 0's. 为了简洁和保持示例,内存地址是32位,因此您可以看到额外的0。 Fair enough? 很公平? Right, let's move on. 对,让我们继续吧。
Now on to pointers... char *name
is a pointer to type of char
.... 现在指向... char *name
是指向char
类型的指针....
Edit: And we initialize it to NULL as shown Thanks Dan for pointing out the little error... 编辑:我们将它初始化为NULL如图所示感谢Dan指出小错误...
char *name = (char*)NULL; +======+ +======+ |0x5678| -> |0x0000| -> NULL +======+ +======+
At compile/link time, the name
does not point to anything, but has a compile/link time address for the symbol name
(0x5678), in fact it is NULL
, the pointer address of name
is unknown hence 0x0000. 在编译/链接时, name
不指向任何内容,但是具有符号name
的编译/链接时间地址(0x5678),实际上它是NULL
, name
的指针地址是未知的,因此是0x0000。
Now, remember , this is crucial, the address of the symbol is known at compile/link time, but the pointer address is unknown, when dealing with pointers of any type 现在,请记住 , 这是至关重要的,符号的地址在编译/链接时是已知的,但在处理任何类型的指针时指针地址是未知的
Suppose we do this: 假设我们这样做:
name = (char *)malloc((20 * sizeof(char)) + 1); strcpy(name, "Fortran");
We called malloc
to allocate a memory block for 20 bytes, no, it is not 21, the reason I added 1 on to the size is for the '\\0' nul terminating character. 我们调用malloc
为20个字节分配一个内存块,不,它不是21,我加上1的大小的原因是'\\ 0'nul终止字符。 Suppose at runtime, the address given was 0x9876, 假设在运行时,给出的地址是0x9876,
char *name; +======+ +======+ +-+-+-+-+-+-+-+--+ |0x5678| -> |0x9876| -> |F|o|r|t|r|a|n|\0| +======+ +======+ +-+-+-+-+-+-+-+--+
So when you do this: 所以当你这样做时:
printf("The address of name is %p\n", name); printf("The address of name is %p\n", &name); Output would be: The address of name is 0x00005678 The address of name is 0x00009876
Now, this is where the illusion that ' arrays and pointers are the same comes into play here ' 现在,这就是“ 阵列和指针相同的幻觉在这里发挥作用 ”
When we do this: 当我们这样做时:
char ch = name[1];
What happens at runtime is this: 运行时会发生什么:
name
is looked up 查找符号name
的地址 ch
. 根据下标值1获取偏移量并将其添加到指针地址,即0x9877,以检索该存储器地址的值,即“o”并分配给ch
。 That above is crucial to understanding this distinction, the difference between arrays and pointers is how the runtime fetches the data, with pointers, there is an extra indirection of fetching. 上面的内容对于理解这种区别至关重要,数组和指针之间的区别在于运行时如何使用指针获取数据,还有一个额外的取向间接。
Remember , an array of type T will always decay into a pointer of the first element of type T . 请记住 , T类型的数组总是会衰减为 T类型 的第一个元素的指针 。
When we do this: 当我们这样做时:
char ch = *(name + 5);
name
is looked up 查找符号name
的地址 ch
. 获取基于值5的偏移量并将其添加到指针地址,即0x987A以检索该存储器地址处的值,即“r”并分配给ch
。 Incidentally, you can also do that to the array of chars also... 顺便说一下,你也可以对字符数组这样做...
Further more, by using subscript operators in the context of an array ie char name[] = "...";
此外,通过在数组的上下文中使用下标运算符,即char name[] = "...";
and name[subscript_value]
is really the same as *(name + subscript_value). 和name[subscript_value]
实际上与*(name + subscript_value)相同。 ie 即
name[3] is the same as *(name + 3)
And since the expression *(name + subscript_value)
is commutative , that is in the reverse, 因为表达式*(name + subscript_value)
是可交换的 ,所以相反,
*(subscript_value + name) is the same as *(name + subscript_value)
Hence, this explains why in one of the answers above you can write it like this ( despite it, the practice is not recommended even though it is quite legitimate! ) 因此,这解释了为什么在上面的一个答案中你可以这样写( 尽管如此,即使它是非常合理的,也不推荐这种做法! )
3[name]
Ok, how do I get the value of the pointer? 好的,我如何获得指针的值? That is what the *
is used for, Suppose the pointer name
has that pointer memory address of 0x9878, again, referring to the above example, this is how it is achieved: 这就是*
的用途,假设指针name
指针内存地址为0x9878,再次参考上面的例子,这就是它的实现方式:
char ch = *name;
This means, obtain the value that is pointed to by the memory address of 0x9878, now ch
will have the value of 'r'. 这意味着,获取0x9878的内存地址所指向的值,现在ch
将具有值'r'。 This is called dereferencing. 这称为解除引用。 We just dereferenced a name
pointer to obtain the value and assign it to ch
. 我们只是取消引用一个name
指针来获取值并将其分配给ch
。
Also, the compiler knows that a sizeof(char)
is 1, hence you can do pointer increment/decrement operations like this 此外,编译器知道sizeof(char)
为1,因此您可以像这样执行指针递增/递减操作
*name++; *name--;
The pointer automatically steps up/down as a result by one. 指针会自动向上/向下逐步上升/下降。
When we do this, assuming the pointer memory address of 0x9878: 当我们这样做时,假设指针内存地址为0x9878:
char ch = *name++;
What is the value of *name and what is the address, the answer is, the *name
will now contain 't' and assign it to ch
, and the pointer memory address is 0x9879. * name的值是什么,地址是什么,答案是, *name
现在包含't'并将其分配给ch
,指针存储器地址是0x9879。
This where you have to be careful also, in the same principle and spirit as to what was stated earlier in relation to the memory boundaries in the very first part (see 'What happens if name[12] was executed' in the above) the results will be the same, ie code crashes and burns! 在这里你必须要小心,与前面关于内存边界的内容相同的原则和精神(参见上文中“如果名称[12]被执行时会发生什么”)结果将是相同的,即代码崩溃和烧伤!
Now, what happens if we deallocate the block of memory pointed to by name
by calling the C function free
with name
as the parameter, ie free(name)
: 现在,如果我们通过以name
作为参数调用C函数free
来解除分配name
所指向的内存块,即free(name)
:
+======+ +======+ |0x5678| -> |0x0000| -> NULL +======+ +======+
Yes, the block of memory is freed up and handed back to the runtime environment for use by another upcoming code execution of malloc
. 是的,内存块被释放并传回运行时环境,供另一个即将发布的malloc
代码执行使用。
Now, this is where the common notation of Segmentation fault comes into play, since name
does not point to anything, what happens when we dereference it ie 现在,这是分段错误的常用符号发挥作用的地方,因为name
不指向任何东西,当我们取消引用它时会发生什么,即
char ch = *name;
Yes, the code will crash and burn with a 'Segmentation fault', this is common under Unix/Linux. 是的,代码将崩溃并以“分段故障”刻录,这在Unix / Linux下很常见。 Under windows, a dialog box will appear along the lines of 'Unrecoverable error' or 'An error has occurred with the application, do you wish to send the report to Microsoft?'....if the pointer has not been malloc
d and any attempt to dereference it, is guaranteed to crash and burn. 在Windows下,将出现一个对话框,其中包含“不可恢复的错误”或“应用程序发生错误,您是否希望将报告发送给Microsoft?”....如果指针不是malloc
d并且任何取消引用它的尝试都会保证崩溃和燃烧。
Also: remember this, for every malloc
there is a corresponding free
, if there is no corresponding free
, you have a memory leak in which memory is allocated but not freed up. 另外:记住这一点,对于每个malloc
都有一个相应的free
,如果没有相应的free
,你有一个内存泄漏,其中分配了内存但没有释放。
And there you have it, that is how pointers work and how arrays are different to pointers, if you are reading a textbook that says they are the same, tear out that page and rip it up! 而且你有它,这就是指针如何工作以及数组如何与指针不同,如果你正在阅读一本说它们相同的教科书,那就撕下那个页面然后撕掉它! :) :)
I hope this is of help to you in understanding pointers. 我希望这有助于你理解指针。
That is a pointer. 那是一个指针。 Which means it is a variable that holds an address in memory. 这意味着它是一个在内存中保存地址的变量。 It "points" to another variable. 它“指向”另一个变量。
It actually cannot - by itself - hold large amounts of characters. 它实际上不能 - 本身 - 持有大量的字符。 By itself, it can hold only one address in memory. 它本身只能在内存中保存一个地址。 If you assign characters to it at creation it will allocate space for those characters, and then point to that address. 如果在创建时为其分配字符,它将为这些字符分配空间,然后指向该地址。 You can do it like this: 你可以这样做:
char* name = "Mr. Anderson";
That is actually pretty much the same as this: 这实际上与此基本相同:
char name[] = "Mr. Anderson";
The place where character pointers come in handy is dynamic memory. 字符指针派上用场的地方是动态记忆。 You can assign a string of any length to a char pointer at any time in the program by doing something like this: 您可以通过执行以下操作,随时在程序中为char指针指定任意长度的字符串:
char *name;
name = malloc(256*sizeof(char));
strcpy(name, "This is less than 256 characters, so this is fine.");
Alternately, you can assign to it using the strdup()
function, like this: 或者,您可以使用strdup()
函数为其分配,如下所示:
char *name;
name = strdup("This can be as long or short as I want. The function will allocate enough space for the string and assign return a pointer to it. Which then gets assigned to name");
If you use a character pointer this way - and assign memory to it, you have to free the memory contained in name before reassigning it. 如果以这种方式使用字符指针 - 并为其分配内存,则必须在重新分配之前释放名称中包含的内存。 Like this: 像这样:
if(name)
free(name);
name = 0;
Make sure to check that name is, in fact, a valid point before trying to free its memory. 在尝试释放内存之前,请确保检查该名称实际上是一个有效点。 That's what the if statement does. 这就是if语句的作用。
The reason you see character pointers get used a whole lot in C is because they allow you to reassign the string with a string of a different size. 您看到字符指针在C中被大量使用的原因是因为它们允许您使用不同大小的字符串重新分配字符串。 Static character arrays don't do that. 静态字符数组不会这样做。 They're also easier to pass around. 他们也更容易传球。
Also, character pointers are handy because they can be used to point to different statically allocated character arrays. 此外,字符指针很方便,因为它们可用于指向不同的静态分配字符数组。 Like this: 像这样:
char *name;
char joe[] = "joe";
char bob[] = "bob";
name = joe;
printf("%s", name);
name = bob;
printf("%s", name);
This is what often happens when you pass a statically allocated array to a function taking a character pointer. 这是将静态分配的数组传递给带有字符指针的函数时经常发生的情况。 For instance: 例如:
void strcpy(char *str1, char *str2);
If you then pass that: 如果你然后传递:
char buffer[256];
strcpy(buffer, "This is a string, less than 256 characters.");
It will manipulate both of those through str1 and str2 which are just pointers that point to where buffer and the string literal are stored in memory. 它将通过str1和str2操纵这两者,它们只是指向缓冲区和字符串文字存储在内存中的指针。
Something to keep in mind when working in a function. 在函数中工作时要记住的事情。 If you have a function that returns a character pointer, don't return a pointer to a static character array allocated in the function. 如果您有一个返回字符指针的函数,请不要返回指向函数中分配的静态字符数组的指针。 It will go out of scope and you'll have issues. 它将超出范围,你会遇到问题。 Repeat, don't do this: 重复一遍,不要这样做:
char *myFunc() {
char myBuf[64];
strcpy(myBuf, "hi");
return myBuf;
}
That won't work. 那不行。 You have to use a pointer and allocate memory (like shown earlier) in that case. 在这种情况下,您必须使用指针并分配内存(如前所示)。 The memory allocated will persist then, even when you pass out of the functions scope. 分配的内存将保持不变,即使您传出函数范围也是如此。 Just don't forget to free it as previously mentioned. 只是不要忘记如前所述释放它。
This ended up a bit more encyclopedic than I'd intended, hope its helpful. 这最终比我想要的更加百科全书,希望它有用。
Editted to remove C++ code. 编辑删除C ++代码。 I mix the two so often, I sometimes forget. 我经常把两者混在一起,我有时会忘记。
char* name is just a pointer. char * name只是一个指针。 Somewhere along the line memory has to be allocated and the address of that memory stored in name . 沿线存储器的某处必须分配存储器名称的存储器地址。
In C a string is actually just an array of characters, as you can see by the definition. 在C中,字符串实际上只是一个字符数组,您可以从定义中看到。 However, superficially, any array is just a pointer to its first element, see below for the subtle intricacies. 然而,从表面上看,任何数组都只是指向其第一个元素的指针,请参阅下面的细微复杂性。 There is no range checking in C, the range you supply in the variable declaration has only meaning for the memory allocation for the variable. 在C中没有范围检查,您在变量声明中提供的范围仅对变量的内存分配有意义。
a[x]
is the same as *(a + x)
, ie dereference of the pointer a incremented by x. a[x]
与*(a + x)
,即指针a的解引用增加x。
if you used the following: 如果您使用以下内容:
char foo[] = "foobar";
char bar = *foo;
bar will be set to 'f' 栏将设为'f'
To stave of confusion and avoid misleading people, some extra words on the more intricate difference between pointers and arrays, thanks avakar: 为了避免混淆并避免误导人们,在指针和数组之间更复杂的差异上有一些额外的话,感谢avakar:
In some cases a pointer is actually semantically different from an array, a (non-exhaustive) list of examples: 在某些情况下,指针实际上在语义上与数组不同,这是一个(非详尽的)示例列表:
//sizeof
sizeof(char*) != sizeof(char[10])
//lvalues
char foo[] = "foobar";
char bar[] = "baz";
char* p;
foo = bar; // compile error, array is not an lvalue
p = bar; //just fine p now points to the array contents of bar
// multidimensional arrays
int baz[2][2];
int* q = baz; //compile error, multidimensional arrays can not decay into pointer
int* r = baz[0]; //just fine, r now points to the first element of the first "row" of baz
int x = baz[1][1];
int y = r[1][1]; //compile error, don't know dimensions of array, so subscripting is not possible
int z = r[1]: //just fine, z now holds the second element of the first "row" of baz
And finally a fun bit of trivia; 最后是一段有趣的琐事; since a[x]
is equivalent to *(a + x)
you can actually use eg '3[a]' to access the fourth element of array a. 因为a[x]
等价于*(a + x)
你实际上可以使用例如'3 [a]'来访问数组a的第四个元素。 Ie the following is perfectly legal code, and will print 'b' the fourth character of string foo. 即以下是完全合法的代码,并将'b'打印为字符串foo的第四个字符。
#include <stdio.h>
int main(int argc, char** argv) {
char foo[] = "foobar";
printf("%c\n", 3[foo]);
return 0;
}
char *name
, on it's own, can't hold any characters . char *name
,就其本身而言, 不能包含任何字符 。 This is important. 这个很重要。
char *name
just declares that name
is a pointer (that is, a variable whose value is an address) that will be used to store the address of one or more characters at some point later in the program. char *name
只声明name
是一个指针(即一个值为地址的变量),它将用于在程序后面的某个时刻存储一个或多个字符的地址。 It does not, however, allocate any space in memory to actually hold those characters, nor does it guarantee that name
even contains a valid address. 但是,它不会在内存中分配任何空间来实际保存这些字符,也不保证name
甚至包含有效地址。 In the same way, if you have a declaration like int number
there is no way to know what the value of number
is until you explicitly set it. 同样,如果你有一个类似int number
的声明,那么在你明确设置它之前,无法知道number
的值是什么。
Just like after declaring the value of an integer, you might later set its value ( number = 42
), after declaring a pointer to char, you might later set its value to be a valid memory address that contains a character -- or sequence of characters -- that you are interested in. 就像声明一个整数的值一样,稍后你可以设置它的值( number = 42
),在声明一个指向char的指针之后,你可能稍后将其值设置为包含一个字符的有效内存地址 - 或者序列人物 - 你感兴趣的。
It is confusing indeed. 这确实令人困惑。 The important thing to understand and distinguish is that char name[]
declares array and char* name
declares pointer. 理解和区分的重要事情是char name[]
声明数组和char* name
声明指针。 The two are different animals. 这两个是不同的动物。
However, array in C can be implicitly converted to pointer to its first element. 但是,C中的数组可以隐式转换为指向其第一个元素的指针。 This gives you ability to perform pointer arithmetic and iterate through array elements (it does not matter elements of what type, char
or not). 这使您能够执行指针运算并遍历数组元素(无论是什么类型的元素,无论是否为char
)。 As @which mentioned, you can use both, indexing operator or pointer arithmetic to access array elements. 正如@which所提到的,您可以使用索引运算符或指针算法来访问数组元素。 In fact, indexing operator is just a syntactic sugar (another representation of the same expression) for pointer arithmetic. 实际上,索引运算符只是指针运算的一种语法糖(同一表达式的另一种表示)。
It is important to distinguish difference between array and pointer to first element of array. 将数组和指针之间的差异区分为数组的第一个元素非常重要。 It is possible to query size of array declared as char name[15]
using sizeof
operator: 可以使用sizeof
运算符查询声明为char name[15]
的数组的sizeof
:
char name[15] = { 0 };
size_t s = sizeof(name);
assert(s == 15);
but if you apply sizeof
to char* name
you will get size of pointer on your platform (ie 4 bytes): 但是如果你将sizeof
应用于char* name
你将获得平台上指针的大小(即4个字节):
char* name = 0;
size_t s = sizeof(name);
assert(s == 4); // assuming pointer is 4-bytes long on your compiler/machine
Also, the two forms of definitions of arrays of char elements are equivalent: 此外,char元素数组的两种形式的定义是等效的:
char letters1[5] = { 'a', 'b', 'c', 'd', '\0' };
char letters2[5] = "abcd"; /* 5th element implicitly gets value of 0 */
The dual nature of arrays, the implicit conversion of array to pointer to its first element, in C (and also C++) language, pointer can be used as iterator to walk through array elements: 数组的双重性质,数组到指向其第一个元素的指针的隐式转换,在C(以及C ++)语言中,指针可以用作遍历数组元素的迭代器:
/ *skip to 'd' letter */
char* it = letters1;
for (int i = 0; i < 3; i++)
it++;
One is an actual array object and the other is a reference or pointer to such an array object. 一个是实际的数组对象,另一个是指向这种数组对象的引用或指针 。
The thing that can be confusing is that both have the address of the first character in them, but only because one address is the first character and the other address is a word in memory that contains the address of the character. 可能令人困惑的是,两者都有第一个字符的地址,但只是因为一个地址是第一个字符而另一个地址是内存中包含字符地址的字。
The difference can be seen in the value of &name
. 可以在&name
的值中看到差异。 In the first two cases it is the same value as just name
, but in the third case it is a different type called pointer to pointer to char , or **char
, and it is the address of the pointer itself. 在前两种情况下,它与name
只是相同的值,但在第三种情况下,它是一个不同的类型,称为指向char的指针 ,或**char
,它是指针本身的地址。 That is, it is a double-indirect pointer. 也就是说,它是一个双间接指针。
#include <stdio.h>
char name1[] = "fortran";
char *name2 = "fortran";
int main(void) {
printf("%lx\n%lx %s\n", (long)name1, (long)&name1, name1);
printf("%lx\n%lx %s\n", (long)name2, (long)&name2, name2);
return 0;
}
Ross-Harveys-MacBook-Pro:so ross$ ./a.out
100001068
100001068 fortran
100000f58
100001070 fortran
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.