简体   繁体   English

qsort功能比较困惑我

[英]qsort function compare confused me

I see lots of people use subtraction in a qsort comparator function. 我看到很多人在qsort比较器函数中使用减法。 I think it is wrong because when dealing with these numbers: int nums[]={-2147483648,1,2,3}; INT_MIN = -2147483648; 我认为这是错误的,因为在处理这些数字时: int nums[]={-2147483648,1,2,3}; INT_MIN = -2147483648; int nums[]={-2147483648,1,2,3}; INT_MIN = -2147483648;

int compare (const void * a, const void * b)
{
  return ( *(int*)a - *(int*)b );
}

I wrote this function to test: 我写了这个函数来测试:

#include <stdio.h>
#include <limits.h>

int compare (const void * a, const void * b)
{
    return ( *(int*)a - *(int*)b );
}

int main(void)
{
    int a = 1;
    int b = INT_MIN;
    printf("%d %d\n", a,b);
    printf("%d\n",compare((void *)&a,(void *)&b));
    return 0;
}

The output is: 输出是:

1 -2147483648
-2147483647

but a > b so the output should be positive。 I have seen many books write like this. 但是a > b所以输出应该是正数。我看过很多书写得像这样。 I think it is wrong; 我认为这是错的; it should be written like this when dealing with int types: 在处理int类型时应该这样写:

int compare (const void * a, const void * b)
{
    if(*(int *)a < *(int *)b)
        return -1;
    else if(*(int *)a > *(int *)b)
        return 1;
    else 
        return 0;
}

I just cannot figure out why many books and web sites write in such a misleading way. 我无法弄清楚为什么许多书籍和网站以这种误导的方式写作。 If you have any different view, please let me know. 如果您有任何不同的观点,请告诉我。

I think it is wrong 我认为这是错误的

Yes, a simple subtraction can lead to int overflow which is undefined behavior and should be avoided. 是的,一个简单的减法可能导致int溢出,这是未定义的行为 ,应该避免。

return *(int*)a - *(int*)b;  // Potential undefined behavior.

A common idiom is to subtract two integer compares. 一个常见的习语是减去两个整数比较。 Various compilers recognize this and create efficient well behaved code. 各种编译器都认识到这一点并创建了高效良好的代码。 Preserving const -ness also is good form. 保持const也是很好的形式。

const int *ca = a;
const int *cb = b;
return (*ca > *cb) - (*ca < *cb);

why many books and web sites write in such a misleading way. 为什么许多书籍和网站都以这种误导的方式写作。

return *a - *b; is conceptually easy to digest - even if it provides the wrong answer with extreme values - often learner code omits edge conditions to get the idea across - "knowing" that values will never be large . 在概念上很容易消化 - 即使它提供了极端值的错误答案 - 通常学习者代码省略了边缘条件来理解 - “知道”价值永远不会很大

Or consider the complexities of comparing long doubles with regard to NaN . 或者考虑比较long doubles与NaN的复杂性。

Your understanding is absolutely correct. 你的理解绝对正确。 This common idiom cannot be used for int values. 这个常用的习惯用法不能用于int值。

Your proposed solution works correctly, although it would be more readable with local variables to avoid so many casts: 您提出的解决方案可以正常工作,尽管使用局部变量可以更加可读,以避免这么多强制转换:

int compare(const void *a, const void *b) {
    const int *aa = a;
    const int *bb = b;
    if (*aa < *bb)
        return -1;
    else if (*aa > *bb)
        return 1;
    else 
        return 0;
}

Note that modern compilers will generate the same code with or without these local variables: always prefer the more readable form. 请注意,现代编译器将使用或不使用这些局部变量生成相同的代码:始终更喜欢更易读的形式。

A more compact solution with the same exact result is commonly used although a bit more difficult to understand: 虽然有点难以理解,但通常使用具有相同精确结果的更紧凑的解决方案:

int compare(const void *a, const void *b) {
    const int *aa = a;
    const int *bb = b;
    return (*aa > *bb) - (*aa < *bb);
}

Note that this approach works for all numeric types, but will return 0 for NaN floating point values. 请注意,此方法适用于所有数字类型,但对于NaN浮点值将返回0

As for your remark: I just cannot figure out why many books and web sites write in such a misleading way : 至于你的评论: 我无法弄清楚为什么许多书籍和网站以如此误导的方式写作

  • Many books and websites contain mistakes, and so do most programs. 许多书籍和网站都存在错误,大多数程序也是如此。 Many programming bugs get caught and squashed before they reach production if the program is tested wisely. 如果对程序进行明智的测试,许多编程错误会在它们到达生产之前被捕获并被压扁。 Code fragments in books are not tested, and although they never reach production , the bugs they contain do propagate virally via unsuspecting readers who learn bogus methods and idioms. 书中的代码片段没有经过测试,虽然它们从未达到过生产 ,但它们所包含的错误确实通过学习虚假方法和成语的毫无戒心的读者进行病毒式传播。 A very bad and lasting side effect. 非常糟糕和持久的副作用。

  • Kudos to you for catching this! 感谢你抓住这个! You have a rare skill among programmers: you are a good reader. 你在程序员中有一种罕见的技能:你是一个好读者。 There are far more programmers who write code than programmers who can read code correctly and see mistakes. 编写代码的程序员远远多于能够正确读取代码并发现错误的程序员。 Hone this skill by reading other people's code, on stack overflow or from open source projects... And do report the bugs. 通过阅读其他人的代码,堆栈溢出或开源项目来获得此技能......并报告错误。

  • The subtraction method is in common use, I have seen it in many places like you and it does work for most value pairs. 减法方法是常用的,我在像你这样的很多地方看到它,它确实适用于大多数价值对。 This bug may go unnoticed for eons. 这个错误可能会被忽视。 A similar problem was latent in the zlib for decades: int m = (a + b) / 2; 类似的问题在zlib中潜伏了几十年: int m = (a + b) / 2; causes a fateful integer overflow for large int values of a and b . 使得用于大一个致命的整数溢出int的值ab

  • The author probably saw it used and thought the subtraction was cool and fast, worth showing in print. 作者可能看到它使用并认为减法很酷而且很快,值得在印刷中显示。

  • Note however that the erroneous function does work correctly for types smaller than int : signed or unsigned char and short , if these types are indeed smaller than int on the target platform, which the C Standard does not mandate. 但请注意,对于小于int类型,错误的函数可以正常工作: signedunsigned charshort ,如果这些类型确实小于目标平台上的int ,而C Standard并不强制要求。

  • Indeed similar code can be found in The C Programming Language by Brian Kernighan and Dennis Ritchie, the famous K&R C bible by its inventors. 事实上,类似的代码可以在Brian Cnighan和Dennis Ritchie 的C编程语言中找到,它是由其发明者着名的K&R C圣经。 They use this approach in a simplistic implementation of strcmp() in chapter 5. The code in the book is dated, going all the way back to the late seventies. 他们在第5章的strcmp()的简单实现中使用了这种方法。书中的代码已经过时,一直追溯到七十年代末。 Although it has implementation defined behavior, it does not invoke undefined behavior in any but the rarest architectures among which the infamous DeathStation-9000 , yet it should not be used to compare int values. 虽然它具有实现定义的行为,但它不会在除了最臭名昭着的体系结构之外的任何一个中调用未定义的行为,其中臭名昭着的DeathStation-9000 ,但它不应该用于比较int值。

You are correct, *(int*)a - *(int*)b poses a risk of integer overflow and ought to be avoided as a method of comparing two int values. 你是对的, *(int*)a - *(int*)b带来整数溢出的风险,作为比较两个int值的方法应该避免。

It is possible it could be valid code in a controlled situation where one knows the values are such that the subtraction will not overflow. 它可能是受控情况下的有效代码,其中人们知道这些值使得减法不会溢出。 In general, though, it should be avoided. 但总的来说,应该避免这种情况。

The reason why so many books are wrong is likely the root of all evil: the K&R book. 这么多书错的原因很可能是万恶之源:K&R书。 In chapter 5.5 they try to teach how to implement strcmp : 在第5.5章中,他们尝试教授如何实现strcmp

int strcmp(char *s, char *t)
{
  int i;
  for (i = 0; s[i] == t[i]; i++)
    if (s[i] == '\0')
      return 0;
  return s[i] - t[i];
}

This code is questionable since char has implementation-defined signedness. 这段代码有问题,因为char具有实现定义的签名。 Ignoring that, and ignoring that they fail to use const correctness as in the standard C version, the code otherwise works, partially because it relies on implicit type promotion to int (which is ugly), partially since they assume 7 bit ASCII, and the worst case 0 - 127 cannot underflow. 忽略这一点,并忽略它们不能像标准C版本那样使用const正确性,否则代码会起作用,部分原因是它依赖于隐式类型提升为int (这是丑陋的),部分原因是因为它们假定为7位ASCII,并且最坏情况0 - 127不能下溢。

Further down in the book, 5.11, they try to teach how to use qsort : 在书5.11中,他们试图教会如何使用qsort

qsort((void**) lineptr, 0, nlines-1,
  (int (*)(void*,void*))(numeric ? numcmp : strcmp));

Ignoring the fact that this code invokes undefined behavior, since strcmp is not compatible with the function pointer int (*)(void*, void*) , they teach to use the above method from strcmp . 忽略这段代码调用未定义行为的事实,因为strcmp与函数指针int (*)(void*, void*)不兼容,所以他们教导使用strcmp的上述方法。

However, looking at their numcmp function, it looks like this: 但是,看看他们的numcmp函数,它看起来像这样:

/* numcmp: compare s1 and s2 numerically */
int numcmp(char *s1, char *s2)
{
  double v1, v2;
  v1 = atof(s1);
  v2 = atof(s2);
  if (v1 < v2)
    return -1;
  else if (v1 > v2)
    return 1;
  else
    return 0;
}

Ignoring the fact that this code will crash and burn if an invalid character is found by atof (such as the very likely locale issue with . versus , ), they actually manage to teach the correct method of writing such a comparison function. 忽略如果atof找到无效字符(例如,非常可能的语言环境问题与.相比,此代码将崩溃和刻录这一事实,他们实际上设法教授编写此类比较函数的正确方法。 Since this function uses floating point, there's really no other way to write it. 由于此函数使用浮点,因此实际上没有其他方法可以编写它。

Now someone might want to come up with an int version of this. 现在有人可能想要提出这个版本的int版本。 If they do it based on the strcmp implementation rather than the floating point implementation, they'll get bugs. 如果他们基于strcmp实现而不是浮点实现来实现它们,他们就会遇到错误。

Overall, just by flipping a few pages in this once canonical book, we already found some 3-4 cases of reliance on undefined behavior and 1 case of reliance on implementation-defined behavior. 总的来说,仅仅通过翻阅这本曾经规范的书中的几页,我们已经发现了大约3-4个依赖于未定义行为的案例和1个依赖于实现定义行为的案例。 So it is really no wonder if people who learn C from this book writes code full of undefined behavior. 因此,如果从本书中学习C的人编写的代码充满了未定义的行为,那真是难怪。

First, it's of course correct that an integer during the comparison could create serious problems for you. 首先,在比较期间,整数当然是正确的,这可能会给您带来严重的问题。

On the other hand, doing a single subtraction is cheaper than going through an if/then/else, and the comparison gets performed O(n^2) times in a quicksort, so if this sort is performance-critical and we can get away with it we may want to use the difference. 另一方面,进行单次减法比通过if / then / else更便宜,并且比较在快速排序中执行O(n ^ 2)次,所以如果这种性能至关重要且我们可以逃脱有了它我们可能想要使用差异。

It will work fine so long as all the values are in some range of size less than 2^31, because then their differences have to be smaller. 只要所有值都在小于2 ^ 31的某个范围内,它就能正常工作,因为它们的差异必须更小。 So if whatever is generating the list you want to sort is going to keep values between a billion and minus one billion then you're fine using subtraction. 因此,如果生成要排序的列表的任何内容将保持在十亿到十亿之间的值,那么您可以使用减法。

Note that checking that the values are in such a range prior to the sort is an O(n) operation. 请注意,在排序之前检查值是否在这样的范围内是O(n)操作。

On the other hand if there's a chance that the overflow could happen, you'd want to use something like the code you wrote in your question 另一方面,如果溢出可能发生,你会想要使用你在问题中写的代码

Note that lots of stuff you see doesn't explicitly take overflow into account; 请注意,您看到的大量内容并未明确考虑溢出; it's just that maybe that's more expected in something that's more obviously an "arithmetic" context. 只是可能在更明显是“算术”背景的东西中更可取。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM