简体   繁体   中英

Use of const in qsort function

The following comes from a qsort function implementation given as a solution to one of the K&R challenge questions. The challenge is to read a list of words and sort them according to the number of occurrences. The struct Word is also shown. Link to the full code: http://clc-wiki.net/wiki/K%26R2_solutions:Chapter_6:Exercise_4

typedef struct WORD
{
  char *Word;
  size_t Count;
  struct WORD *Left;
  struct WORD *Right;
} WORD;

...

CompareCounts(const void *vWord1, const void *vWord2)
{
  int Result = 0;
  WORD * const *Word1 = vWord1;
  WORD * const *Word2 = vWord2;

  assert(NULL != vWord1); 
  assert(NULL != vWord2); 

  /* ensure the result is either 1, 0 or -1 */
  if((*Word1)->Count < (*Word2)->Count)
  {
    Result = 1;
  }
  else if((*Word1)->Count > (*Word2)->Count)
  {
    Result = -1;
  }
   else
  {
    Result = 0;
  }

  return Result;
}

My question is about these lines:

WORD * const *Word1 = vWord1;
WORD * const *Word2 = vWord2;

Is this a declaration of a constant pointer to a constant variable? Or something else? And why does it have to be defined this way for the sort to work?

The source code you linked to is a small application that reads in a text sample, generates a tree data structure that contains word frequency (how many times each word appears in the text sample), and then prints out the list of words from most frequent to least frequent.

/*

  Chapter 6. Structures

          Write a program that prints out the distinct words in its 
          input sorted into decreasing order of frequency of occurrence.
          Precede each word by its count.

  Author: Bryan Williams

*/

The pattern used in this application has a classical and elegant K&R feel about it. The first step is to process the text sample generating a tree structure in which each node contains a piece of text (a word from the text sample) along with a frequency count of how many times the piece of text is found. The second step is to then sort the tree nodes by the frequency counts. The third step is to print the sorted tree nodes in order of frequency to provide a list of the text pieces found along with how many times the text piece was found in the text sample.

The tree used is a binary tree and the tree nodes have the following structure:

typedef struct WORD
{
  char *Word;          // pointer to the text piece, a word of text
  size_t Count;        // frequency count for this word
  struct WORD *Left;   // pointer to tree node child left
  struct WORD *Right;  // pointer to tree node child right
} WORD;

The tree structure is used in order to be efficient about either determining if a text piece has already been found and just incrementing the count or adding the text piece with a count of one to our data storage if it does not.

However the sorting step uses a different criteria for the order of items in the tree than for the processing of the text sample. The text sample uses the text pieces as the way to order the tree nodes but for the actual output we need an order based on the frequency counts. So we need to sort the nodes of the tree on the frequency counts.

Since this is an in memory tree, the first thought for the program would be to create a list of the tree nodes in an array and then sort the list. However sorting an array usually requires moving array elements around except for the special case of the array already being sorted. This approach would also double the amount of memory used for the tree nodes since a copy of the nodes is being made.

Rather than making a copy of the tree nodes and then sorting that list, instead the program creates a list of pointers to the tree nodes and then sorts the list of pointers by referencing the tree nodes the pointers point to.

A Bit About the qsort() interface

The function definition of CompareCounts(const void *vWord1, const void *vWord2) means that vWord1 is a pointer to a const variable whose type is unknown or could be anything.

If we look at the qsort() function declaration it looks like:

void qsort (void* base, size_t num, size_t size, int (*comparator)(const void*,const void*));

So the comparison function used with qsort() must have a compatible argument list or interface description or a modern C compiler will issue an error.

Traditionally with the comparison function used with qsort() as well as bsearch() , we would have a comparison function that would look like:

CompareCounts(const void *vWord1, const void *vWord2)

In the comparison function we would then take the void * arguments and cast them to a pointer of the the actual type that is to be used in the comparison. After that we then use the local, properly typed variables to do the comparison operation.

What qsort() does is to take two elements of the array that it wants to compare and calls the comparison function with pointers to those two elements. The use of void * is to work around the type checking of the C compiler.

The interface specified that the void * pointer is pointing to something that is const because qsort() doesn't want you to change the data that it is providing. It is asking you to test the two data items provided and indicate which is greater or lesser or equal in the collating sequence you are using to sort the data.

The reason for the void * pointer is because the qsort() function does not know what the data is or how the data is structured. qsort() only knows the number of bytes, the size, of each data element so that it can iterate through the array of items, element by element. This allows the array to be any size of struct or other type of data.

The Specific Example Comparison Function

The interface, how the arguments were casted, for CompareCounts() looked strange to me until I reviewed the source code you linked to. That program generates a tree structure then generates an array of pointers which point to the actual nodes in the tree. It is this array of pointers to nodes that is passed to qsort() to sort.

So the array of data provided to qsort() is an array each element of which points to a tree node which is a WORD element stored in the tree data structure. The array of pointers are sorted based on the data the pointers point to.

In order to access a particular node by using the array passed to the qsort() function we have to take the array element and dereference it to get the actual tree node.

Since qsort() passes a pointer to the array element, the void *vWord1 is a pointer to an array element and we have to dereference that pointer to get the actual array element, a pointer to a tree element. However it is not the pointer values we want to use as the sorting criteria but rather what the pointers point to. This requires us to dereference the pointer of the array element in order to access the data of the WORD element in the tree we want to compare.

WORD * const *Word1 = vWord1; does a cast of the void * pointer vWord1 to be a pointer to a const pointer to a WORD . What this means is that Word1 is a pointer, which qsort() uses to point to the item to be sorted, that is a pointer which is const (the array element itself which qsort() does not want you to change) and the const pointer that Word1 points to, points to a WORD (the pointer to the tree node which is the data that the array contains).

What each node of the tree contains is a text word along with a count as to how many times that word is found in a sample of text. The qsort() function is being used to sort the nodes of the tree which results from examining the text sample input from most frequent to least frequent. The list of node pointers is what is provided to qsort() .

So the sort is not sorting the array based on the array values but rather sorting based on what the array values, tree node pointers, point to.

By the way When I try a sample compile, I am seeing a warning warning C4090: 'initializing': different 'const' qualifiers with Visual Studio 2015 for the lines:

WORD * const *Word1 = vWord1;
WORD * const *Word2 = vWord2;

However with the following change the compiler warning goes away:

const WORD * const * Word1 = vWord1;
const WORD * const * Word2 = vWord2;

This change is actually in line with what qsort() is asking, that none of the data should be changed whether the pointer from the array element or the data that the pointer from the array element points to.

Anyway, the expression (*Word1)->Count is taking the pointer to the array element provided by qsort() , dereferencing it to get the pointer to the tree node, and then dereferencing the pointer to the tree node in order to get the member of the WORD struct we want to sort against. The sort uses the frequency count of the words, stored in Count as the sorting criteria.

Addendum topic: Playing with const

A question raised is that with this kind of complex definition, const WORD * const * Word1 = vWord1; what are various ways to generate compiler errors by breaking const and seeing when using const can provide an additional safeguard against inadvertently modifying something that should not be modified.

If we use this definition without the const modifier we would have WORD * * Word1 = vWord1; which means that we have a pointer variable Word1 that has a meaning of something like:

Word1 -> ptr -> WORD

where Word1 is our pointer variable which points to some unknown pointer which in turn points to a variable of type WORD .

Lets look at several different variations of definitions.

WORD * * Worda = vWord1;         // no const
WORD * * const Wordb = vWord1;   // Wordb is const pointer which points to a non-const pointer which points to a non-const WORD
WORD * const * Wordc = vWord1;   // WordC is a non-const pointer which points to a const pointer which points to a non-const WORD
const WORD * const * Wordd = vWord1;  // Wordd is a non-const pointer which points to a const pointer which points to a const WORD
const WORD * const * const Worde = vWord1;  // Worde is a const pointer which points to a const pointer which points to a const WORD

In the source code of the question with a definition of WORD * const * Word1 = vWord1; then Word1 is a non-const pointer which points to a const pointer which points to a non-const WORD . So lets look at several different kinds of assignments:

Word1 = vWord2;   // replace the value of the pointer Word1, allowed since non-const
(*Word1)++;       // Error: increment value of pointer pointed to by Word1, not allowed since pointer pointed to is const so compiler error
(*Word1)->Count++;  // increment value of variable pointed to by the pointer pointed to by Word1, allowed since non-const

Is this a declaration of a constant pointer to a constant variable?

No, it's just a pointer to a constant variable, not a constant pointer.

To make it clear, you are sorting an array of WORD* , for this specific problem, this WORD* pointer is a whole:

typedef WORD *wordptr;

Then the following declaration is more clear:

wordptr const *Word1 = vWord1;

And why does it have to be defined this way for the sort to work?

This is to make sure that a comparator won't modify the content of the pointer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM