简体   繁体   English

为什么我的(重新)strlen实现错误?

[英]Why is my (re)implementation of strlen wrong?

I came up with this little code but all the professionals said its dangerous and I should not write code like this. 我想出了这个小代码,但所有专业人士都表示它很危险,我不应该写这样的代码。 Can anyone highlight its vulnerabilities in 'more' details? 任何人都可以在“更多”细节中突出其漏洞吗?

int strlen(char *s){ 
    return (*s) ? 1 + strlen(s + 1) : 0; 
}

It has no vulnerabilities per se, this is perfectly correct code. 它本身没有漏洞,这是完全正确的代码。 It is prematurely pessimized, of course. 当然,这是过早的悲观。 It will run out of stack space for anything but the shortest strings, and its performance will suck due to recursive calls, but otherwise it's OK. 除了最短的字符串之外,它将耗尽堆栈空间,并且由于递归调用,它的性能会很糟糕,但是否则就可以了。

The tail call optimization most likely won't cope with such code. 尾调用优化很可能无法处理这样的代码。 If you want to live dangerously and depend on tail-call optimizations, you should rephrase it to use the tail-call: 如果你想危险地生活并依赖尾调优化,你应该改为使用尾调用:

// note: size_t is an unsigned integertype

int strlen_impl(const char *s, size_t len) {
    if (*s == 0) return len;
    if (len + 1 < len) return len; // protect from overflows
    return strlen_impl(s+1, len+1);
}        

int strlen(const char *s) {
   return strlen_impl(s, 0);
}

Dangerous it a bit of a stretch, but it is needlessly recursive and likely to be less efficient than the iterative alternative. 危险它有点延伸,但它是不必要的递归,并且可能比迭代替代方案效率低。

I suppose also that given a very long string there is a danger of a stack overflow. 我想还有一个非常长的字符串,存在堆栈溢出的危险。

There are two serious security bugs in this code: 此代码中存在两个严重的安全漏洞:

  1. Use of int instead of size_t for the return type. 使用int而不是size_t作为返回类型。 As written, strings longer than INT_MAX will cause this function to invoke undefined behavior via integer overflow. 如上所述,长于INT_MAX字符串将导致此函数通过整数溢出调用未定义的行为。 In practice, this could lead to computing strlen(huge_string) as some small value like 1, malloc 'ing the wrong amount of memory, and then performing strcpy into it, causing a buffer overflow. 在实践中,这可能导致计算strlen(huge_string)为一些小值,如1, malloc '错误的内存量,然后执行strcpy ,导致缓冲区溢出。

  2. Unbounded recursion which can overflow the stack, ie Stack Overflow. 无限递归,可以溢出堆栈,即堆栈溢出。 :-) A compiler may choose to optimize the recursion into a loop (in this case, it's possible with current compiler technology), but there is no guarantee that it will. :-)编译器可以选择优化递归到循环(在这种情况下,它可以使用当前的编译器技术),但不能保证它会。 In a best case, stack overflow will simply crash the program. 在最好的情况下,堆栈溢出只会使程序崩溃。 In a worst case (eg running on a thread with no guard page) it could clobber unrelated memory, possibly yielding arbitrary code execution. 在最坏的情况下(例如,在没有保护页面的线程上运行),它可能会破坏不相关的内存,可能会产生任意代码执行。

The problem with killing the stack that have been pointed out, ought to be fixed by a decent compiler, where the apparent recursive call is flattened into a loop. 杀死已经指出的堆栈的问题应该由一个体面的编译器修复,其中明显的递归调用被平坦化为循环。 I verified this hypothesis and asked clang to translate your code: 我验证了这个假设,并要求clang翻译你的代码:

//sl.c
unsigned sl(char const* s) {
  return (*s) ? (1+sl(s+1)) : 0;
}

Compiling and disassembling: 编译和反汇编:

clang -emit-llvm -O1 -c sl.c -o sl.o
#                 ^^ Yes, O1 is already sufficient.
llvm-dis-3.2 sl.o

And this is the relevant part of the llvm result (sl.o.ll) 这是llvm结果的相关部分(sl.o.ll)

define i32 @sl(i8* nocapture %s) nounwind uwtable readonly {
  %1 = load i8* %s, align 1, !tbaa !0
  %2 = icmp eq i8 %1, 0
  br i1 %2, label %tailrecurse._crit_edge, label %tailrecurse

tailrecurse:                                      ; preds = %tailrecurse, %0
  %s.tr3 = phi i8* [ %3, %tailrecurse ], [ %s, %0 ]
  %accumulator.tr2 = phi i32 [ %4, %tailrecurse ], [ 0, %0 ]
  %3 = getelementptr inbounds i8* %s.tr3, i64 1
  %4 = add i32 %accumulator.tr2, 1
  %5 = load i8* %3, align 1, !tbaa !0
  %6 = icmp eq i8 %5, 0
  br i1 %6, label %tailrecurse._crit_edge, label %tailrecurse

tailrecurse._crit_edge:                           ; preds = %tailrecurse, %0
  %accumulator.tr.lcssa = phi i32 [ 0, %0 ], [ %4, %tailrecurse ]
  ret i32 %accumulator.tr.lcssa
}

I don't see a recursive call. 我没有看到递归调用。 Indeed clang called the looping label tailrecurse which gives us a pointer as to what clang is doing here. 确实clang称为循环标签tailrecurse ,它给出了一个关于clang在这里做什么的指针。

So, finally ( tl;dr ) yes, this code is perfectly safe and a decent compiler with a decent flag will iron the recursion out. 所以,最后( tl; dr )是的,这段代码是完全安全的,带有合适标志的合适的编译器可以解决递归问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM