简体   繁体   English

qsort比较字母顺序的字符串

[英]qsort to compare strings for alphabetical order

I'm using the qsort() that comes with the stdlib.h library to sort an array of structures of strings. 我正在使用stdlib.h库随附的qsort()对字符串结构数组进行排序。

It's essentially an array of strings but with a structure that is containing the array. 它本质上是一个字符串数组,但是具有包含该数组的结构。

For example: 例如:

typedef struct node {
  char name[MAX_SIZE + 1];
} Node;

Then my array of nodes that contains the names would be: 然后,包含名称的节点数组将是:

Node nodes_list[MAX_SIZE + 1];

My question is, I want to sort nodes_list so when I print the following: 我的问题是,我想对nodes_list进行排序,以便在打印以下内容时:

for (i = 0; i < size; i++) {
   printf("%s\n", nodes_list[i].name);
}

it prints all the names in alphabetical order. 它按字母顺序打印所有名称。

I would like to do sort the list using qsort and my comparator function is this: 我想使用qsort对列表进行排序,而我的比较器功能是这样的:

int compare(const void *a, const void *b) {
  const char **ia = (const char **)a;
  const char **ib = (const char **)b;
  return strcmp(*ia, *ib);
}

when I run the function with qsort : 当我用qsort运行函数时:

qsort(nodes_list, size, sizeof(Node), compare);

I get a segmentation fault (core dumped). 我遇到了分段错误(核心已转储)。

I know I am getting a segmentation fault with this snippet of code because without it, I can print the list of names fine. 我知道使用此代码段会遇到分段错误,因为没有它,我可以很好地打印名称列表。 Not sorted of course. 当然没有排序。

Can someone help? 有人可以帮忙吗?

Your comparison function is wrong for your array format. 您的比较函数对于您的数组格式是错误的。

Here's a simple checklist you can follow to get the types and sizes right when using qsort: 这是一个简单的清单,使用qsort时您可以遵循该清单来确定类型和大小:

  1. The third argument to qsort should be sizeof *x where x is the first argument. qsort的第三个参数应该是sizeof *x ,其中x是第一个参数。
  2. The first thing inside the qsort function should be a declaration of a pair of pointers initialized by copying the function's arguments. qsort函数中的第一件事应该是通过复制函数的参数初始化的一对指针的声明。 There should not be any cast. 不应有演员表。 Casts from void * aren't necessary. 不需要从void *进行强制转换。
  3. You might think you need a cast because of const , but if you do, it's because you've put the const in the wrong place. 您可能会因为const而需要进行const ,但是如果这样做,是因为您将const放在错误的位置。 To assign a const void * successfully without a cast, the destination type should have exactly one * after the const keyword. 要成功分配const void *而不进行强制转换,目标类型应在const关键字之后恰好带有* const char * and char const * are OK (and equivalent to each other); const char *char const *是可以的(并且彼此等效); const char *const * is also OK (and different); const char *const *也可以(和不同); const char ** is wrong. const char **是错误的。 And if you can't put a const before the * because you don't have a * because you typedef'ed the pointer type, this is why you shouldn't do that. 如果你不能把一个const的前* ,因为你没有一个* ,因为你Typedef的指针类型, 这就是为什么你不应该这样做。
  4. Aside from the addition of const , the type of the pointers declared at the beginning of the comparison function should be exactly the same as the type of the first argument to qsort, after applying the "array decays to pointer" rule if the first argument to qsort is the name of an array. 除了const ,在比较函数开始时声明的指针类型应与qsort的第一个参数的类型完全相同,如果将“数组衰减到指针”规则应用到qsort是数组的名称。

In your case, the first argument to qsort is nodes_List which is an array of Node , so apply the decay-to-pointer rule and you get a Node * , then add a const and you get: 在您的情况下,qsort的第一个参数是nodes_List ,它是Node的数组,因此应用衰减指针规则,您将获得Node * ,然后添加一个const ,您将获得:

const Node *a_node = a;
const Node *b_node = b;

Now you have a nice pair of properly typed pointers, and you simply compare them in the obvious way: 现在,您有一对不错的正确键入的指针,您只需以明显的方式比较它们:

return strcmp(a_node->name, b_node->name);

To explain why rule #4 works, you have to look closely at the memory layout. 为了解释规则4为何起作用,您必须仔细查看内存布局。 Supposing that MAX_SIZE is 15, so that MAX_SIZE+1 is a nice round 16, your Node type contains a 16-byte array of char, and your nodes_list contains 16 of those for a total of 16*16=256 bytes. 假设MAX_SIZE为15,那么MAX_SIZE + 1是一个不错的第16轮,您的Node类型包含一个16字节的char数组,而您的nodes_list包含16个nodes_list ,总共16 * 16 = 256字节。 Suppose that nodes_list is located at memory address 0x1000. 假设nodes_list位于内存地址0x1000。 Then the layout is: 那么布局是:

+---------------+---------------+               +---------------+
| nodes_list[0] | nodes_list[1] |...............| nodes_list[15]|
+---------------+---------------+               +---------------+
^               ^                               ^               ^
0x1000          0x1010                          0x10f0          0x1100

Addresses 0x1000 through 0x10ff are actually part of the object. 地址0x1000到0x10ff实际上是对象的一部分。 0x1100 is the trailing edge - one byte past the end. 0x1100是后沿-末尾一个字节。

Suppose further that the array is half-full ( size is 8), and it is populated with these 8 strings: 进一步假设该数组是半满的( size为8),并用以下8个字符串填充:

Hotel Foxtrot Echo Charlie Golf Delta Bravo Alpha 

and that the unused portions are filled with 0's. 并将未使用的部分填充为0。 The object is made up of these 256 bytes (I've added spaces and line breaks for illustration purposes) 该对象由这256个字节组成(出于说明目的,我添加了空格和换行符)

H  o  t  e  l \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
F  o  x  t  r  o  t \0 \0 \0 \0 \0 \0 \0 \0 \0
E  c  h  o \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
C  h  a  r  l  i  e \0 \0 \0 \0 \0 \0 \0 \0 \0
G  o  l  f \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
D  e  l  t  a \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
B  r  a  v  o \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
A  l  p  h  a \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
... 128 more \0's

Now, you pass qsort the starting address of this block of memory (first arg, nodes_list , 0x1000) plus 2 pieces of information about its internal structure: the number of elements (2nd arg, size , 8) and the number of elements (3rd arg, sizeof Node , 16). 现在,向qsort传递此内存块的起始地址(第一个arg, nodes_list ,0x1000),外加2条有关其内部结构的信息:元素数量(第二个arg, size 8)和元素数量(第三个) arg, sizeof Node ,16)。 With that information it knows that the elements of the array are at addresses 0x1000, 0x1010, 0x1020, ... 0x1070. 有了这些信息,就知道该数组的元素位于地址0x1000、0x1010、0x1020,... 0x1070。 It picks a pair of them - which pair it chooses depends on what sorting algorithm it uses - let's say for simplicity it is a stupid bubble sort which starts by comparing the first two elements. 它选择一对-选择哪种对取决于它使用的排序算法-为了简单起见,这是一个愚蠢的冒泡排序,它从比较前两个元素开始。

qsort calls your comparison function with the addresses of the elements, 0x1000 and 0x1010. qsort使用元素的地址0x1000和0x1010调用比较函数。 It doesn't know their types, but it knows their sizes. 它不知道它们的类型,但是知道它们的大小。 Each one is an array element occupying 16 bytes. 每个是一个占用16个字节的数组元素。

Your comparison function receives a=0x1000 and b=0x1010 . 您的比较函数接收a=0x1000b=0x1010 They are pointers to 16-byte objects - specifically, they each point to a struct Node . 它们是指向16字节对象的指针-具体来说,它们每个都指向struct Node If you do the wrong thing, and cast them to char ** , what happens? 如果您做错了事,并将其转换为char ** ,会发生什么? Well, you get a char ** with value 0x1000, and you have to dereference that char ** to get a char * to pass to strcmp , so you do that dereference, and end up loading the bytes 'H', 'o', 't', 'e' as a pointer value (assuming your pointers are 4 bytes long). 好吧,您将获得一个值为0x1000的char ** ,并且必须取消引用该char **才能将char *传递给strcmp ,因此您进行了这种取消引用,并最终加载了字节'H', 'o', 't', 'e'作为指针值(假设您的指针长4个字节)。 On a big-endian machine with ASCII as the charset, this is a pointer to memory address 0x486f7465, which you pass to strcmp . 在以ASCII作为字符集的big-endian计算机上,这是指向内存地址0x486f7465的指针,您将该地址传递给strcmp strcmp crashes. strcmp崩溃。 The result of trying struct Node ** is basically the same. 尝试使用struct Node **的结果基本相同。

Another good thing to know is how qsort uses the member size information in its reordering of the array. 要知道的另一件好事是qsort在数组的重新排序中如何使用成员大小信息。 The 3rd arg is not just the size of an object that the comparison acts on, it's also the size of the object that gets moved as a unit when reordering the array. 第三个参数不仅是比较所作用的对象的大小,而且还是对数组重新排序时作为一个单元移动的对象的大小。 After your comparison function returns 1 (strcmp("Hotel", "Foxtrot")), our hypothetical bubble sort implementation of qsort will swap the objects at 0x1000 and 0x1010 to put them in the correct order. 在您的比较函数返回1(strcmp(“ Hotel”,“ Foxtrot”))之后,我们假设的qsort冒泡排序实现将交换0x1000和0x1010的对象,以将它们按正确的顺序放置。 It will do this with a series of 3 memcpy's of 16 bytes each. 它将通过一系列3个memcpy(每个16字节)来完成此操作。 It has to move all those extra \\0 's around because it doesn't know that they are useless. 它必须移动所有多余的\\0 ,因为它不知道它们是无用的。 Those 16-byte objects are opaque to qsort. 那些16字节的对象对qsort是不透明的。 This might be a reason to consider building a secondary array of pointers and qsorting it instead of the main array, when your main array has objects that are very large. 当您的主数组中的对象非常大时,这可能是考虑构建辅助指针数组并对其进行qsort而不是主数组的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM