简体   繁体   中英

qsort() vs std::sort, comparison function philosophical difference

I was wondering why there is two entirely different approaches to specify the comparison function in qsort() { C version } vs the std::sort() .

qsort needs the comparison function like below: I don't know the reason why it needs three kinds of return values -1 , 0 , +1.

int comp(int *x, int *y){   
   return *x - *y; 
}

while the comparison function for std::sort() , which looks more consistent to me as it is written in terms of function to follows an invariant. ie if x is smaller than y function return true, x is in the right position relative to y

bool comp(int x, int y){   
       return x < y; 
}

Why do we need three values -1,0,+1 when returning a bool (or int with two value 0 , 1) is much simpler and clean?

Others have pointed out the equivalence of the two ways of doing comparisons; here's why the two approaches are followed.

In C, the comparison needs to be a function pointer. That means you always get function call and pointer indirection overhead. When qsort was designed back in the 1970s on a PDP-11 computer, the amount of overhead had to be reduced so comparison functions such as strcmp did a three-way comparison in a single function call. (Note that qsort is generally an unstable sort, so the equality case may seem useless, but it can be made stable with the appropriate pointer comparisons.)

In C++, the comparison can be inlined at template instantiation time, so much of the overhead is gone (there need not even be a function call) and the simpler convention can be used. This also means that std::sort can work with operator< overloads by default.

I'd wondered this same thing and came up with a theory based on strcmp and std::string::compare . Namely, doing a comparison can take a relatively large amount of time. To determine if an object A is less, equal, or greater than another object B , you can use:

if (A < B) {
    //do stuff where A is less
} else if (B < A) {
    //do stuff where A is greater
} else {
    //do stuff where A is equal
}

Which commonly requires the iteration of A and B to be done twice, once for A<B and once for B<A . If you check all three possibilities at the same time, you only have to iterate over the strings one time. Thus the -1, 0, 1 convention was used.

However, C++ seems to have dropped this convention. One argument I heard was that computers have changed, and due to the complexity of the code for doing the three-way comparison, doing one three-way comparison was slower and more error prone, and most of the time we didn't care about the equality case. In fact, all the standard sorting algorithms are designed to work like this, though an individual implementation might do something more interesting.

if (A < B) {
    //do stuff where A is less
} else {
    //do stuff where A is greater or equal
}

According to MSVC's timings on this test , string::operator< is nearly twice as fast as strcmp , and calling string::operator< twice was only slightly slower than doing it once. (Caching and simpler code I guess?). GCC's results are similar.

If for any two x and y, either x < y or x == y or x > y holds, then two way of giving the comparison function are equivalent. One may define == and > operators in terms of < as follows:

  • x == y is equivalent to !(x < y) && !(y < x)
  • x > y is equivalent to y < x

As you may realize, it is generally more efficient (and simpler) to implement <, <=, ==, >= and > in terms of the three-way compare operation, then implementing == in terms of < as shown above. I think that should be the reason why C (and many other languages actually) chose the three-way compare function, even though quicksort might not exploit that extra information.

C++ has operator overloading, but no three-way compare operator (neither has C), so shifting from three-way compare to the 'less' operator (despite the above mentioned potential disadvantages) allows to exploit operator overloading.

The qsort comparison function is modeled after strcmp and memcmp, which return < 0, 0, or > 0 ... which is more information than just returning a < or >= indicator ... it takes two such calls to determine whether two elements are equal. The notion of "invariant" doesn't apply here: obviously the invariant that a[i] < a[i+1] does not apply to the original array ... in fact it doesn't apply to the final array because a[i] == a[i+1] is possible. Nor does the term "consistent" apply ... the results are and must be consistent for both types of comparison functions. "cleaner" is in the eye of the beholder, and "much simpler" is an overstatement.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM