[英]qsort to compare strings for alphabetical order
I'm using the qsort()
that comes with the stdlib.h
library to sort an array of structures of strings. 我正在使用
stdlib.h
库随附的qsort()
对字符串结构数组进行排序。
It's essentially an array of strings but with a structure that is containing the array. 它本质上是一个字符串数组,但是具有包含该数组的结构。
For example: 例如:
typedef struct node {
char name[MAX_SIZE + 1];
} Node;
Then my array of nodes that contains the names would be: 然后,包含名称的节点数组将是:
Node nodes_list[MAX_SIZE + 1];
My question is, I want to sort nodes_list
so when I print the following: 我的问题是,我想对
nodes_list
进行排序,以便在打印以下内容时:
for (i = 0; i < size; i++) {
printf("%s\n", nodes_list[i].name);
}
it prints all the names in alphabetical order. 它按字母顺序打印所有名称。
I would like to do sort the list using qsort
and my comparator function is this: 我想使用
qsort
对列表进行排序,而我的比较器功能是这样的:
int compare(const void *a, const void *b) {
const char **ia = (const char **)a;
const char **ib = (const char **)b;
return strcmp(*ia, *ib);
}
when I run the function with qsort
: 当我用
qsort
运行函数时:
qsort(nodes_list, size, sizeof(Node), compare);
I get a segmentation fault (core dumped). 我遇到了分段错误(核心已转储)。
I know I am getting a segmentation fault with this snippet of code because without it, I can print the list of names fine. 我知道使用此代码段会遇到分段错误,因为没有它,我可以很好地打印名称列表。 Not sorted of course.
当然没有排序。
Can someone help? 有人可以帮忙吗?
Your comparison function is wrong for your array format. 您的比较函数对于您的数组格式是错误的。
Here's a simple checklist you can follow to get the types and sizes right when using qsort: 这是一个简单的清单,使用qsort时您可以遵循该清单来确定类型和大小:
sizeof *x
where x
is the first argument. sizeof *x
,其中x
是第一个参数。 void *
aren't necessary. void *
进行强制转换。 const
, but if you do, it's because you've put the const
in the wrong place. const
而需要进行const
,但是如果这样做,是因为您将const
放在错误的位置。 To assign a const void *
successfully without a cast, the destination type should have exactly one *
after the const
keyword. const void *
而不进行强制转换,目标类型应在const
关键字之后恰好带有*
。 const char *
and char const *
are OK (and equivalent to each other); const char *
和char const *
是可以的(并且彼此等效); const char *const *
is also OK (and different); const char *const *
也可以(和不同); const char **
is wrong. const char **
是错误的。 And if you can't put a const
before the *
because you don't have a *
because you typedef'ed the pointer type, this is why you shouldn't do that. const
的前*
,因为你没有一个*
,因为你Typedef的指针类型, 这就是为什么你不应该这样做。 const
, the type of the pointers declared at the beginning of the comparison function should be exactly the same as the type of the first argument to qsort, after applying the "array decays to pointer" rule if the first argument to qsort is the name of an array. const
,在比较函数开始时声明的指针类型应与qsort的第一个参数的类型完全相同,如果将“数组衰减到指针”规则应用到qsort是数组的名称。 In your case, the first argument to qsort is nodes_List
which is an array of Node
, so apply the decay-to-pointer rule and you get a Node *
, then add a const
and you get: 在您的情况下,qsort的第一个参数是
nodes_List
,它是Node
的数组,因此应用衰减指针规则,您将获得Node *
,然后添加一个const
,您将获得:
const Node *a_node = a;
const Node *b_node = b;
Now you have a nice pair of properly typed pointers, and you simply compare them in the obvious way: 现在,您有一对不错的正确键入的指针,您只需以明显的方式比较它们:
return strcmp(a_node->name, b_node->name);
To explain why rule #4 works, you have to look closely at the memory layout. 为了解释规则4为何起作用,您必须仔细查看内存布局。 Supposing that MAX_SIZE is 15, so that MAX_SIZE+1 is a nice round 16, your
Node
type contains a 16-byte array of char, and your nodes_list
contains 16 of those for a total of 16*16=256 bytes. 假设MAX_SIZE为15,那么MAX_SIZE + 1是一个不错的第16轮,您的
Node
类型包含一个16字节的char数组,而您的nodes_list
包含16个nodes_list
,总共16 * 16 = 256字节。 Suppose that nodes_list is located at memory address 0x1000. 假设nodes_list位于内存地址0x1000。 Then the layout is:
那么布局是:
+---------------+---------------+ +---------------+
| nodes_list[0] | nodes_list[1] |...............| nodes_list[15]|
+---------------+---------------+ +---------------+
^ ^ ^ ^
0x1000 0x1010 0x10f0 0x1100
Addresses 0x1000 through 0x10ff are actually part of the object. 地址0x1000到0x10ff实际上是对象的一部分。 0x1100 is the trailing edge - one byte past the end.
0x1100是后沿-末尾一个字节。
Suppose further that the array is half-full ( size
is 8), and it is populated with these 8 strings: 进一步假设该数组是半满的(
size
为8),并用以下8个字符串填充:
Hotel Foxtrot Echo Charlie Golf Delta Bravo Alpha
and that the unused portions are filled with 0's. 并将未使用的部分填充为0。 The object is made up of these 256 bytes (I've added spaces and line breaks for illustration purposes)
该对象由这256个字节组成(出于说明目的,我添加了空格和换行符)
H o t e l \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
F o x t r o t \0 \0 \0 \0 \0 \0 \0 \0 \0
E c h o \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
C h a r l i e \0 \0 \0 \0 \0 \0 \0 \0 \0
G o l f \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
D e l t a \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
B r a v o \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
A l p h a \0 \0 \0 \0 \0 \0 \0 \0 \0 \0 \0
... 128 more \0's
Now, you pass qsort the starting address of this block of memory (first arg, nodes_list
, 0x1000) plus 2 pieces of information about its internal structure: the number of elements (2nd arg, size
, 8) and the number of elements (3rd arg, sizeof Node
, 16). 现在,向qsort传递此内存块的起始地址(第一个arg,
nodes_list
,0x1000),外加2条有关其内部结构的信息:元素数量(第二个arg, size
8)和元素数量(第三个) arg, sizeof Node
,16)。 With that information it knows that the elements of the array are at addresses 0x1000, 0x1010, 0x1020, ... 0x1070. 有了这些信息,就知道该数组的元素位于地址0x1000、0x1010、0x1020,... 0x1070。 It picks a pair of them - which pair it chooses depends on what sorting algorithm it uses - let's say for simplicity it is a stupid bubble sort which starts by comparing the first two elements.
它选择一对-选择哪种对取决于它使用的排序算法-为了简单起见,这是一个愚蠢的冒泡排序,它从比较前两个元素开始。
qsort calls your comparison function with the addresses of the elements, 0x1000 and 0x1010. qsort使用元素的地址0x1000和0x1010调用比较函数。 It doesn't know their types, but it knows their sizes.
它不知道它们的类型,但是知道它们的大小。 Each one is an array element occupying 16 bytes.
每个是一个占用16个字节的数组元素。
Your comparison function receives a=0x1000
and b=0x1010
. 您的比较函数接收
a=0x1000
和b=0x1010
。 They are pointers to 16-byte objects - specifically, they each point to a struct Node
. 它们是指向16字节对象的指针-具体来说,它们每个都指向
struct Node
。 If you do the wrong thing, and cast them to char **
, what happens? 如果您做错了事,并将其转换为
char **
,会发生什么? Well, you get a char **
with value 0x1000, and you have to dereference that char **
to get a char *
to pass to strcmp
, so you do that dereference, and end up loading the bytes 'H', 'o', 't', 'e'
as a pointer value (assuming your pointers are 4 bytes long). 好吧,您将获得一个值为0x1000的
char **
,并且必须取消引用该char **
才能将char *
传递给strcmp
,因此您进行了这种取消引用,并最终加载了字节'H', 'o', 't', 'e'
作为指针值(假设您的指针长4个字节)。 On a big-endian machine with ASCII as the charset, this is a pointer to memory address 0x486f7465, which you pass to strcmp
. 在以ASCII作为字符集的big-endian计算机上,这是指向内存地址0x486f7465的指针,您将该地址传递给
strcmp
。 strcmp
crashes. strcmp
崩溃。 The result of trying struct Node **
is basically the same. 尝试使用
struct Node **
的结果基本相同。
Another good thing to know is how qsort uses the member size information in its reordering of the array. 要知道的另一件好事是qsort在数组的重新排序中如何使用成员大小信息。 The 3rd arg is not just the size of an object that the comparison acts on, it's also the size of the object that gets moved as a unit when reordering the array.
第三个参数不仅是比较所作用的对象的大小,而且还是对数组重新排序时作为一个单元移动的对象的大小。 After your comparison function returns 1 (strcmp("Hotel", "Foxtrot")), our hypothetical bubble sort implementation of qsort will swap the objects at 0x1000 and 0x1010 to put them in the correct order.
在您的比较函数返回1(strcmp(“ Hotel”,“ Foxtrot”))之后,我们假设的qsort冒泡排序实现将交换0x1000和0x1010的对象,以将它们按正确的顺序放置。 It will do this with a series of 3 memcpy's of 16 bytes each.
它将通过一系列3个memcpy(每个16字节)来完成此操作。 It has to move all those extra
\\0
's around because it doesn't know that they are useless. 它必须移动所有多余的
\\0
,因为它不知道它们是无用的。 Those 16-byte objects are opaque to qsort. 那些16字节的对象对qsort是不透明的。 This might be a reason to consider building a secondary array of pointers and qsorting it instead of the main array, when your main array has objects that are very large.
当您的主数组中的对象非常大时,这可能是考虑构建辅助指针数组并对其进行qsort而不是主数组的原因。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.