简体   繁体   English

lang分析器误报还是溢出?

[英]Clang Analyzer false positive or overflow?

Below is the simplification of some code of ours which seems like it demonstrates a bug in the clang analyzer, though it's possible there's a real bug in our code. 下面是我们的一些代码的简化,似乎在演示clang分析器中的错误,尽管我们的代码中可能存在真正的错误。

typedef enum {
    value1 =  0x8000, /*If value1 is initialized at < 0x8000,
                        the bug doesn't occur*/
    value2,
    value3,
    value4,
    value5,
    value6
}myEnum;

static bool test_UTIL(bool aBool, UINT16 iCaseValue)
{
    bool canMatch = true;
    int myValue; /*not initialized*/

    if (aBool)
        myValue = 1;  /*initialized */
    else
        canMatch = ((value1 == iCaseValue)
             || (value2 == iCaseValue)
             || (value3 == iCaseValue)
             || (value4 == iCaseValue)
             || (value5 == iCaseValue)
             || (value6 == iCaseValue));

    if (canMatch)
    {
        switch (iCaseValue) 
        {
            case value1:
            case value2:
            case value3:
            case value4:
            case value5:
            case value6:
                break;

            default:
                /*This triggers a clang warning, claiming myValue is undefined*/
            canMatch = (iCaseValue == myValue);
            break;
        }
    }

    return canMatch;
}

As noted in the comment, the bug only happens when the enumeration starts in the range of 0x8000, which would be the sign bit if it were not unsigned. 如注释中所述,仅当枚举在0x8000范围内开始时才会发生该错误,如果未对它进行枚举,它将是符号位。 Is it possible that we're getting some kind of overflow implicitly casting to a signed 16 bit integer in the switch statement somehow? 是否有可能以某种方式在switch语句中隐式地将某种溢出强制转换为有符号的16位整数? Or is Clang confused? 还是Clang感到困惑?

Of course, this example could likely be refactored to achieve equivalent behavior, but the original that this is based off of is 20+ year old code that is not worth rewriting just to satisfy a faulty analyzer warning. 当然,可以将该示例重构以实现等效的行为,但是该示例所基于的原始示例是20多年的代码,仅为了满足错误的分析器警告而不必重写。

Edit: I've added the assembly generated by the test_UTIL() function below. 编辑:我已经添加了下面的test_UTIL()函数生成的程序集。 I can't read assembly enough to spot a problem here, though others may be interested in it: 我看不懂汇编,足以在这里发现问题,尽管其他人可能对此感兴趣:

_test_UTIL:                             ## @test_UTIL
Ltmp15:
    .cfi_startproc
Lfunc_begin1:
    .loc    1 24 0                  ## /Users/jbrooks/Desktop/test/test/main.c:24:0
## BB#0:
    pushq   %rbp
Ltmp16:
    .cfi_def_cfa_offset 16
Ltmp17:
    .cfi_offset %rbp, -16
    movq    %rsp, %rbp
Ltmp18:
    .cfi_def_cfa_register %rbp
    movw    %si, %ax
    movl    %edi, -4(%rbp)
    movw    %ax, -6(%rbp)
    .loc    1 25 22 prologue_end    ## /Users/jbrooks/Desktop/test/test/main.c:25:22
Ltmp19:
    movl    $1, -12(%rbp)
    .loc    1 28 2                  ## /Users/jbrooks/Desktop/test/test/main.c:28:2
    cmpl    $0, -4(%rbp)
    je  LBB1_2
## BB#1:
    .loc    1 29 3                  ## /Users/jbrooks/Desktop/test/test/main.c:29:3
    movl    $1, -16(%rbp)
    jmp LBB1_9
LBB1_2:
    movb    $1, %al
    movl    $32768, %ecx            ## imm = 0x8000
    .loc    1 31 3                  ## /Users/jbrooks/Desktop/test/test/main.c:31:3
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#3:
    movb    $1, %al
    movl    $32769, %ecx            ## imm = 0x8001
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#4:
    movb    $1, %al
    movl    $32770, %ecx            ## imm = 0x8002
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#5:
    movb    $1, %al
    movl    $32771, %ecx            ## imm = 0x8003
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#6:
    movb    $1, %al
    movl    $32772, %ecx            ## imm = 0x8004
    movzwl  -6(%rbp), %edx
    cmpl    %edx, %ecx
    movb    %al, -17(%rbp)          ## 1-byte Spill
    je  LBB1_8
## BB#7:
    movl    $32773, %eax            ## imm = 0x8005
    movzwl  -6(%rbp), %ecx
    cmpl    %ecx, %eax
    sete    %dl
    movb    %dl, -17(%rbp)          ## 1-byte Spill
LBB1_8:
    movb    -17(%rbp), %al          ## 1-byte Reload
    andb    $1, %al
    movzbl  %al, %ecx
    movl    %ecx, -12(%rbp)
LBB1_9:
    .loc    1 38 2                  ## /Users/jbrooks/Desktop/test/test/main.c:38:2
    cmpl    $0, -12(%rbp)
    je  LBB1_14
## BB#10:
    .loc    1 40 3                  ## /Users/jbrooks/Desktop/test/test/main.c:40:3
Ltmp20:
    movzwl  -6(%rbp), %eax
    leal    -32768(%rax), %eax
    cmpl    $5, %eax
    ja  LBB1_12
    jmp LBB1_11
LBB1_11:
    .loc    1 48 5                  ## /Users/jbrooks/Desktop/test/test/main.c:48:5
Ltmp21:
    jmp LBB1_13
LBB1_12:
    .loc    1 52 5                  ## /Users/jbrooks/Desktop/test/test/main.c:52:5
    movzwl  -6(%rbp), %eax
    cmpl    -16(%rbp), %eax
    sete    %cl
    andb    $1, %cl
    movzbl  %cl, %eax
    movl    %eax, -12(%rbp)
Ltmp22:
LBB1_13:
LBB1_14:
    .loc    1 57 2                  ## /Users/jbrooks/Desktop/test/test/main.c:57:2
    movl    -12(%rbp), %eax
    popq    %rbp
    ret
Ltmp23:
Lfunc_end1:

One unknown is the underlying integer type chosen by the compiler to represent myEnum . 一个未知数是编译器选择用来表示myEnum的基础整数类型。 This is “implementation-defined” in the sense that the choice needs to be deterministic for separately compiled files to be linkable together, but it is not implementation-defined in the sense that the compiler's documentation explains how this type is chosen. 从需要对单独编译的文件可链接在一起的选择进行确定的意义上来说,这是“实现定义的”,但从编译器的文档说明如何选择此类型的意义上来说,这不是实现定义的。 The choice depends on the enum's definition, and any description could only be an algorithm. 选择取决于枚举的定义,任何描述都只能是一种算法。

Regardless of this shadow, I think that the function is defined (it does not read from an uninitialized myValue for any arguments). 无论阴影如何,我都认为函数已定义(对于任何参数,它都不会从未初始化的myValue读取)。 In other words, the warning is a false positive. 换句话说,警告是误报。 I have “verified” this with another static analyzer that detects uses of uninitialized memory. 我已经用另一个可以检测未初始化内存使用的静态分析器来“验证”了这一点。

What you could do to lift the “integer type for myEnum ” shadow is post the assembly code that clang-the-compiler generates. myEnummyEnum整数类型”的阴影,您可以做的是发布clang-the-compiler生成的汇编代码。 If there is an uninitialized access in the assembly code, it will be easier to understand why. 如果汇编代码中有未初始化的访问,将更容易理解原因。


What might be happening here, but a full-featured static analyzer such as Clang is a complex beast and an explanation coming from someone who is not familiar with its internals should be taken with a grain of salt, is that the underlying integer type chosen for myEnum is different when 0x8000 is picked for value1 as opposed to smaller values. 这里可能会发生什么,但是像Clang这样的功能齐全的静态分析器是一门复杂的野兽,对不熟悉其内部结构的人的解释应该带有一点盐味,那就是为当为value1选择0x8000而不是较小的值时, myEnum是不同的。 For smaller values, the underlying type for myEnum could be a signed 16-bit short int , whereas 0x8000 forces the compiler to use an unsigned short int . 对于较小的值, myEnum的基础类型可以是带符号的16位short int ,而0x8000则强制编译器使用unsigned short int This different type for myEnum would introduce more implicit conversions in the Abstract Syntax Tree representing the function, making it harder to predict, and causing the false positive. myEnum这种不同类型将在表示该函数的抽象语法树中引入更多隐式转换,使其更难以预测,并导致误报。 I do not work on Clang but I can assure you that these implicit conversions are always a pain to handle in a static analyzer for C. 我不在Clang上工作,但可以向您保证,这些隐式转换始终很难在C的静态分析器中处理。


Clang developers consider false positives bugs and they certainly would like to hear about this one. lang开发人员考虑误报错误,他们当然希望听到这一错误。 The homepage says: 主页上说:

Please help us in this endeavor by reporting false positives 请通过举报误报来帮助我们实现这一目标

and this sentence links directly to the explanation on how to file bugs. 这句话直接链接到有关如何提交错误的解释。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM