简体   繁体   English

使用字符串数组:指针数组 - 它们是否像多维数组?

[英]Using arrays of character strings: arrays of pointers - Are they like multidimensional arrays?

I've been reading C++ for dummies lately and either the title is a misnomer or they didn't count on me. 我最近一直在为傻瓜读C ++而且标题是用词不当或他们没有指望我。 On a section about utilizing arrays of pointers with characters strings they show a function on which I've been completely stumped and don't know where to turn. 在关于利用带有字符串的指针数组的部分中,它们显示了一个函数,我已经完全难以理解并且不知道在哪里转动。

char* int2month(int nMonth)
{
//check to see if value is in rang
if ((nMonth < 0) || (nMonth > 12))
    return "invalid";

//nMonth is valid - return the name of the month
char* pszMonths[] = {"invalid", "January", "February", "March", "April", "May", "June", 
                     "July", "August", "September", "October", "November", "December"};

return pszMonths[nMonth];
} 

First of (but not the main question), I don't understand why the return type is a pointer and how you can return pszMonths without it going out of scope. 首先(但不是主要问题),我不明白为什么返回类型是一个指针,以及如何在不超出范围的情况下返回pszMonths。 I've read about it in this book and online but I don't get it in this example. 我已经在本书和网上阅读过这篇文章但我在这个例子中没有得到它。

The main question I have is "how does this work?!?!". 我的主要问题是“这是如何工作的?!?!”。 I don't understand how you can create an array of pointers and actually initialize them. 我不明白你如何创建一个指针数组并实际初始化它们。 If I remember correctly you can't do this with numeric data types. 如果我没记错的话,你不能用数字数据类型做到这一点。 Is each pointer in the "array of pointers" like an array itself, containing the individual characters which make up the words? “指针数组”中的每个指针都是一个数组本身,包含构成单词的单个字符吗? This whole thing just boggles my mind. 整件事让我大吃一惊。

August 20 - Since there seems to me some confusion by the people trying to help me at as to where my confusion actually stems from I'll try to explain it better. 8月20日 - 因为在我看来有些困惑,试图帮助我解决我的困惑实际上源于哪里,我会试着更好地解释它。 The section of code in particular I am concerned with is the following: 我特别关注的代码部分如下:

//nMonth is valid - return the name of the month
char* pszMonths[] = {"invalid", "January", "February", "March", "April", "May", "June", 
                 "July", "August", "September", "October", "November", "December"};

I thought that when you made a pointer you could only assign it to another predetermined value. 我认为当你制作指针时,你只能将它指定给另一个预定值。 I'm confused that what seems to be an array of pointers (going by the book here) initializes the month names. 我很困惑,似乎是一个指针数组(通过这里的书)初始化月份名称。 I did not think pointers could actually initialize values. 我不认为指针实际上可以初始化值。 Is the array dynamically allocating memory? 数组是否动态分配内存? Is "invalid" essentially equivalent to a "new char;" “无效”基本上等同于“新的字符”; statement or something similar? 声明或类似的东西?

I'll try re-reading the posts in case they answered my questions but I just didn't understand the first time around. 我会尝试重新阅读帖子,以防他们回答我的问题,但我只是第一次不理解。

ok, let's take one line at a time. 好的,让我们一次拿一行。

char* int2month(int nMonth)

This line is most probably WRONG , because it says the function returns a pointer to a modifiable char (by convention this will be the first char element of an array). 这行很可能是错误的 ,因为它说该函数返回一个指向可修改char的指针(按照惯例,这将是数组的第一个char元素)。 Instead it should say char const* or const char* as the result type. 相反,它应该说char const*const char*作为结果类型。 These two specifications mean exactly the same, namely a pointer to a char that you cannot modify. 这两个规范的含义完全相同,即指向您无法修改的char的指针。

{

This is just the opening brace of the function body. 这只是功能体的开头括号。 The function body ends at corresponding closing brace. 功能体在相应的闭合支架处结束。

//check to see if value is in rang

This is a comment. 这是一个评论。 It is ignored by the compiler. 它被编译器忽略。

if ((nMonth < 0) || (nMonth > 12))
    return "invalid";

Here the return statement is executed if and only if the condition in the if holds. 这里的return ,当且仅当该条件被执行的语句if成立。 The purpose is to deal in a predictable way with incorrect argument value. 目的是以可预测的方式处理不正确的参数值。 However, the checking is probably WRONG because it allows both values 0 and 12 as valid, which gives a total of 13 valid values, whereas a calendar year has only 12 months. 但是,检查可能是错误的,因为它允许值0和12都有效,这总共提供13个有效值,而日历年只有12个月。

By the way, technically, for the return statement the specified return value is an array of 8 char elements, namely the 7 characters plus a nullbyte at the end. 顺便说一下,从技术上讲,对于return语句,指定的返回值是一个包含8个char元素的数组,即7个字符加上最后的nullbyte。 This array is implicitly converted to a pointer to its first element, which is called a type decay . 该数组被隐式转换为指向其第一个元素的指针,称为类型衰减 This particular decay, from string literal to pointer to non-const char , is specially supported in C++98 and C++03 in order to be compatible with old C, but is invalid in the upcoming C++0x standard. 这个特殊的衰减,从字符串文字到指针到非const char ,在C ++ 98和C ++ 03中特别支持,以便与旧的C兼容,但在即将到来的C ++ 0x标准中无效。

The book should not teach such ugly things; 这本书不应该教这么难看的东西; use const for the result type. 使用const作为结果类型。


//nMonth is valid - return the name of the month
char* pszMonths[] = {"invalid", "January", "February", "March", "April", "May", "June", 
                     "July", "August", "September", "October", "November", "December"};

This array initialization again involves that decay. 这个数组初始化再次涉及衰减。 It's an array of pointers. 这是一个指针数组。 Each pointer is initialized with a string literal, which type-wise is an array, and decays to pointer. 每个指针都用字符串文字初始化,字符串文字是一个数组,并衰减到指针。

By the way, the "psz" prefix is a monstrosity called Hungarian Notation . 顺便说一下,“psz”前缀是一种称为匈牙利表示法的怪物。 It was invented for C programming, supporting the help system in Microsoft's Programmer's Workbench. 它是为C编程发明的,支持Microsoft的Programmer's Workbench中的帮助系统。 In modern programming it serves no useful purpose but instead just akes the simplest code read like gibberish. 在现代编程中,它没有任何用处,只是让最简单的代码读起来像乱码。 You really don't want to adopt that. 你真的不想采纳那个。

return pszMonths[nMonth];

This indexing has formal Undefined Behavior , also known affectionately as just "UB", if nMonth is the value 12, since there is no array element at index 12. In practice you'll get some gibberish result. 如果nMonth是值12,那么这个索引具有正式的Undefined Behavior ,也被称为“UB”,因为索引12处没有数组元素。实际上你会得到一些乱码的结果。

EDIT : oh I didn't notice that the author has placed the month name "invalid" at the front, which makes for 13 array elements. 编辑 :哦,我没有注意到作者在前面放置了月份名称“无效”,这使得13个数组元素。 how to obscure code... i didn't notice it because it's very bad and unexpected; 如何模糊代码...我没有注意到它,因为它非常糟糕和意外; the checking for "invalid" is done higher up in the function. 检查“无效”是在函数中更高的位置完成的。


} 

And this is the closing brace of the function body. 这是函数体的右大括号。

Cheers & hth., 干杯&hth。,

Perhaps a line-by-line explanation will help. 也许逐行解释会有所帮助。

/* This function takes an int and returns the corresponding month
 0 returns invalid
 1 returns January
 2 returns February
 3 returns March
 ...
 12 returns December
*/
char* int2month(int nMonth)
{
// if nMonth is less than 0 or more than 12, it's an invalid number
if ((nMonth < 0) || (nMonth > 12))
    return "invalid";

// this line creates an array of char* (strings) and fills it with the names of the months
//
char* pszMonths[] = {"invalid",  // index 0
                     "January",  // index 1
                     "February", // index 2
                     "March",    // index 3
                     "April",    // index 4
                     "May",      // index 5
                     "June",     // index 6
                     "July",     // index 7
                     "August",   // index 8
                     "September",// index 9
                     "October",  // index 10
                     "November", // index 11
                     "December"  // index 12
                    };

// use nMonth to index the pszMonths array to return the appropriate month
// if nMonth is 1, returns January because pszMonths[1] is January
// if nMonth is 2, returns February because pszMonths[2] is February
// etc
return pszMonths[nMonth];
} 

First thing to get out of the way that you might not know is that a string literal in your program (stuff with double quotes around it) is really of the char* type 1 . 首先要避开你可能不知道的方法是程序中的字符串文字(带有双引号的东西)实际上是char* type 1

Second thing that you might not have realized is that indexing into an array of char* s (which is char* pszStrings[] ) yields a char* , which is a string. 你可能没有意识到的第二件事是索引到char* s数组( char* pszStrings[] )会产生一个char* ,它是一个字符串。

The reason why you can return something from local scope in this instance is because string literals are stored in the program at compile time and do not get destroyed. 在这个实例中你可以从本地作用域返回一些东西的原因是因为字符串文字在编译时存储在程序中并且不会被销毁。 For instance, this is perfectly fine: 例如,这非常好:

char* blah() { return "blah"; }

And it's almost like doing this 2 : 这几乎就像这样做2

int blah() { return 5; }

Secondly, when you have an = {/* stuff */} after an array declaration, that's called an initializer list. 其次,当在数组声明后有一个= {/* stuff */} ,称为初始化列表。 If you leave off the size of the array like you're doing, the compiler figures out how big to make the array by how many elements are in the initializer list. 如果你不像你正在做的那样省去数组的大小,编译器会根据初始化列表中的元素数量计算出数组的大小。 So char* pszMonths[] means "an array of char*" and since you have "invalid" , "January" , "February" , etc. in the initializer list and they are char* s 1 , you're just initializing your array of char* s with some char* s. 所以char* pszMonths[]意思是“char *的数组”,因为你在初始化列表中有"invalid""January""February"等等,它们是char* s 1 ,你只是初始化你的带有一些char* s的char* s数组。 And you misremembered about not being able to do this with numeric types, because you can do this with any type, numeric types and strings included. 你错误地想到无法使用数字类型执行此操作,因为您可以使用任何类型,数字类型和字符串来执行此操作。

1 It's not really a char* , it's a char const[x] , and you cannot modify that memory like you could with a char* , but that's not important to you right now. 1它不是真正的char* ,它是一个char const[x] ,你不能像使用char*一样修改那个内存,但这对你来说并不重要。

2 It's not really like that, but if it helps you to think of it that way, feel free until you get better at C++ and can handle the various subtleties without dying. 2它不是那样的,但是如果它能帮助你以这种方式思考它,那么在C ++变得更好并且能够处理各种细微之处而不会死亡之前,请随意。

What's your expectation on what int2month is supposed to do? 你对int2month应该做什么的期望是什么?

Do you have a mental model of what the memory looks like? 你有一个关于记忆的心理模型吗? Here's my pictorial representation of the memory, for example: 这是我对记忆的图形表示,例如:

pszMonths =      [   .       ,     .   ,   .    , ...]
                     |             |       |
                     |             |       |
                     V             |       |   
                     "invalid"     |       V
                                   |    "February"
                                   V
                               "January"

pszMonths is an array, which you should already be familiar with. pszMonths是一个你应该已经熟悉的数组。 The array's elements are pointers, though. 但是,数组的元素是指针。 You have to follow the arrows down to their values, in which case are strings. 您必须按箭头向下移动到它们的值,在这种情况下是字符串。 This kind of indirect representation is necessary: it's not easy to do this with a flat representation, because each month name has its own, variable length. 这种间接表示是必要的:用平面表示来做这件事并不容易,因为每个月的名称都有自己的可变长度。

It's very hard to tell where you're getting stuck on without more discussion. 如果没有更多的讨论,很难说你会被困在哪里。 You need to say more. 你需要说更多。

[Edit] [编辑]

Ok, you've said a little more. 好的,你已经说了一点。 It sounds like you need to know a little more about C's program model. 听起来你需要更多地了解C的程序模型。 When your program compiles, it reduces down to a code part, and a data part. 当您的程序编译时,它会缩减为代码部分和数据部分。

What's included in the data part? 数据部分包含哪些内容? Things like string literals. 像字符串文字这样的东西。 Each string literal is laid out somewhere in memory. 每个字符串文字都放在内存中的某个位置。 If your compiler is good, and if you use the same literal twice, your compiler won't have two copies, but will reuse them. 如果您的编译器是好的,并且如果您使用相同的文字两次,您的编译器将没有两个副本,但将重用它们。

Here's a small program to demonstrate. 这是一个小程序来演示。

#include <stdio.h>
int main(void) {
  char *name1 = "foo";
  char *name2 = "foo";
  char *name3 = "bar";

  printf("The address of the string in the data segment is: %d\n", (int) name1);
  printf("The address of the string in the data segment is: %d\n", (int) name2);
  printf("The address of the string in the data segment is: %d\n", (int) name3);
  return 0;
}

Here's what things look like when I run this program: 这是我运行这个程序时的样子:

$ ./a.out
The address of the string in the data segment is: 134513904
The address of the string in the data segment is: 134513904
The address of the string in the data segment is: 134513908

When you run a C program, the data part of your program, (as well as the code part of your program, of course), gets loaded into memory. 当您运行C程序时,程序的数据部分(当然还有程序的代码部分)会被加载到内存中。 Any pointer that refers to a location in data is good, as long as your program continues to run. 只要程序继续运行,任何引用数据中某个位置的指针都是好的。 A pointer to somewhere in the data is valid across function calls, in particular. 特别是指向数据中某处的指针在函数调用中是有效的。

Look at the outputs more closely. 更仔细地看一下输出。 name1 and name2 are pointers to the same place in data, because it's the same literal string. name1和name2是指向数据中相同位置的指针,因为它是相同的文字字符串。 Your C compiler is often very good at keeping the data compact and unfragmented, which is why you can see that the bytes for "bar" is stored right up against the bytes for "foo". 你的C编译器通常非常擅长保持数据紧凑和未分段,这就是为什么你可以看到“bar”的字节正好存储在“foo”的字节中。

(What we're seeing is a low-level detail, and potentially not always the case that the compiler will pack the string literals side-by-side: your compiler has freedom to put the representation of those strings pretty much anywhere. But it's cute to see that it's doing so here.) (我们看到的是一个低级细节,并且可能并不总是编译器将并排打包字符串文字的情况:您的编译器可以自由地将这些字符串的表示放在任何地方。但它是很高兴看到它在这里这样做。)

As an related note, that's why it's ok for a C program to do something like this: 作为一个相关的说明,这就是为什么C程序可以做这样的事情:

char* good_function() {
  char* msg = "ok";
  return msg;
}

but not ok to do something like this: 但不能做这样的事情:

char* bad_function() {
  char msg[] = "uh oh";
  return msg;
}

These two functions have entirely different meanings! 这两个功能有着完全不同的含义!

  1. The first tells the compiler: "Store this string in the data segment. When you run this function, give me back the address into the data segment". 第一个告诉编译器:“将此字符串存储在数据段中。当您运行此函数时,请将地址返回到数据段”。
  2. The second, bad function here says "When you run this function: make a temporary variable on the stack with enough space to write 'uh oh'. Now pop off the temporary space and return an address into the stack... oh, wait, that address is not pointing anywhere good, it is..." 第二个坏函数在这里说“当你运行这个函数时:在堆栈上创建一个临时变量,有足够的空间来写'呃哦'。现在弹出临时空间并将地址返回堆栈......哦,等等,那个地址并没有指向好的地方,它是......“

This code does not return pszMonths , but it returns one of the pointers contained in pszMonths . 此代码不返回pszMonths ,但它返回pszMonths包含的指针之一。 These point to string literals, which remain valid even when going out of scope. 这些指向字符串文字,即使超出范围也保持有效。

One part of this code that is confusing is that it returns a char* rather than a char const* . 这段代码令人困惑的一部分是它返回一个char*而不是一个char const* This means that it is easy to accidentally modify the strings. 这意味着很容易意外地修改字符串。 Attempting to do so would result in undefined behaviour. 尝试这样做会导致未定义的行为。

Typically string literals are implemented by placing the strings in the data section of the executable. 通常,字符串文字是通过将字符串放在可执行文件的数据部分中来实现的。 This means that pointers to them always remain valid. 这意味着指向它们的指针始终有效。 When the code in int2month is executed, pszMonths is filled up with pointers, but the underlying data is sitting elsewhere in the executable. 当执行int2month中的代码时, int2month会被指针填充,但底层数据会位于可执行文件的其他位置。

As I said earlier, this code is very unsafe, and doesn't deserve to be enshrined by being published in a book. 正如我之前所说,这段代码非常不安全,不值得通过出版在书中来实现。 String literals can be bound to char* , but they are actually made up of char const s. 字符串文字可以绑定到char* ,但它们实际上由char const This makes it very easy to accidentally attempt to modify them, which will actually result in undefined behaviour. 这使得很容易意外地尝试修改它们,这实际上会导致未定义的行为。 The only reason that this behaviour exists is to maintain compatibility with C, and it should never be used in new code. 存在此行为的唯一原因是保持与C的兼容性,并且永远不应在新代码中使用它。

In C, the strings are simply sequences of bytes stored in sequential memory locations, byte 0 marking the end of string. 在C中,字符串只是存储在顺序存储单元中的字节序列,字节0表示字符串的结尾。 For example, 例如,

char *s = "abcd"

would result in compiler allocating 2 memory locations: one five bytes long ( abcd plus the terminating 0 ) and one large enough to hold the address of the first one (s). 会导致编译器分配2个内存位置:一个五个字节长( abcd加上终止0 ),一个大到足以保存第一个(s)的地址。 The second location is a pointer variable, the first one is what it points to. 第二个位置是指针变量,第一个是它所指向的位置。

For the array of string, the compiler allocates two memory locations again. 对于字符串数组,编译器再次分配两个内存位置。 For 对于

char *strings[] = {"abc", "def"}

strings will have two pointers in it, and the other locations will have bytes abc\\0def\\0 . strings有两个指针,其他位置的字节为abc\\0def\\0 Then the first pointer points at a and second at d . 然后第一个指针指向a ,第二个指针指向d

First of all, let's assume char* can be replaced with string . 首先,我们假设char*可以用string替换。

So: 所以:

string int2month(int nMonth)
{ /* ... */ }

You return a pointer to char because you can't return an array of char s in C or C++. 您返回一个指向char的指针,因为您无法在C或C ++中返回char数组。


In this line: 在这一行:

return "invalid";

"invalid" lives in the program's memory. "invalid"生活在程序的记忆中。 This means it's always there for you. 这意味着它永远在你身边。 (But it's undefined behaviour if you try to change it directly without using strcpy() first! 1 ) (但是如果你尝试直接更改它而不首先使用strcpy()那么它是未定义的行为! 1


Imagine this: 想象一下:

char* szInvalid = "invalid";
char* szJanuary = "January";
char* szFebruary = "February";

string szMarch = "March";

char* pszMonths[] = {szInvalid, szJanuary, szFebruary, szMarch};

Do you see why it's an array of char* s? 你明白为什么它是一系列的char* s?


1 If you do this: 1如果你这样做:

char* szFoo = "invalid";
szFoo[0] = '!'; szFoo[1] = '?';

char* szBar = "invalid"; // This *might* happen: szBar == "!?valid"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM