简体   繁体   English

通过 strlen 的 function 计算字符串的长度

[英]calculate the length of string by the function of strlen

  char arr[]={'a','b','c'};
  int len=strlen(arr);

I know that when the pointer of char meet the address of '0' , this function would stop running and return the length between the array's first address and the address of '0' .我知道当 char 的指针遇到'0'的地址时,这个 function 将停止运行并返回数组的第一个地址和'0'的地址之间的长度。 But when I created one string by that way, I didn't put '0' .但是当我用这种方式创建一个字符串时,我没有放'0' So the pointer of char maybe keep moving to find the address of '0' .所以 char 的指针可能会继续移动以找到'0'的地址。 In this process, the pointer maybe made a error about out-of-bounds access.在这个过程中,指针可能会出现越界访问错误。 So why this code didn't make warn to me or why this code didn't make error?那么为什么这段代码没有向我发出警告或者为什么这段代码没有出错呢?

strlen() only works correctly for zero-terminated character arrays, and what you have is not one. strlen()仅适用于零终止字符 arrays,而您所拥有的不是一个。

What len returns for your program is entirely dependent on what happens to be in memory after the address arr + 3 . len为您的程序返回的内容完全取决于地址arr + 3之后发生在 memory 中的内容。

If there's a zero there, then you'll get 3. If there's other data before a zero, then you'll get another number.如果那里有一个零,那么你会得到 3。如果在零之前还有其他数据,那么你会得到另一个数字。 If you're unlucky and there's no zero (in your process's memory space), your program will crash with an out-of-bounds read.如果您不走运并且没有零(在您的进程的 memory 空间中),您的程序将因越界读取而崩溃。

For instance, the program例如,程序

#include <stdio.h>
#include <string.h>

int main(void) {
  char blarr[] = {'d', 'e', 'f'};
  char arr[] = {'a', 'b', 'c'};
  int len = strlen(arr);
  printf("%d\n", len);
  return 0;
}

may print 6, depending on how the compiler allocates arr and blarr on stack.可能会打印 6,具体取决于编译器如何在堆栈上分配arrblarr

Your compiler doesn't warn about anything, because your program is technically correct – you're passing in a char* to strlen , that's fine – but it's not smart enough to detect that that char* isn't a zero-terminated string.你的编译器不会发出任何警告,因为你的程序在技术上是正确的——你将char*传递给strlen ,这很好——但它不够聪明,无法检测到char*不是以零结尾的字符串。

So the pointer of char maybe keep moving to find the address of '0'.In this process, the pointer maybe made a error about out-of-bounds access.所以 char 的指针可能会一直移动以找到 '0' 的地址。在这个过程中,指针可能会出现越界访问错误。

Yes, that's exactly what happened.是的,这正是发生的事情。

So why this code didn't make warn to me or why this code didn't make error?那么为什么这段代码没有向我发出警告或者为什么这段代码没有出错呢?

Because the declaration因为声明

char arr[] = {'a','b','c'};

is perfectly valid.是完全有效的。 You haven't given the compiler any indication that you intend to use arr as a string.您没有向编译器表明您打算将arr用作字符串。

A somewhat more interesting case is if you were to write一个更有趣的例子是,如果你要写

char arr[3] = "abc";

Due to a historical quirk, this is perfectly legal C, although it creates exactly the same array arr and will have exactly the same problem if you pass it to to strlen .由于历史上的怪癖,这是完全合法的 C,尽管它创建了完全相同的数组arr并且如果将它传递给strlen也会遇到完全相同的问题。 Here, though, I believe some compilers will warn, and it would certainly be an appropriate warning, since the feature is debatable, and rarely deliberately used.不过,在这里,我相信一些编译器发出警告,而且这肯定是一个适当的警告,因为该功能是值得商榷的,而且很少有人故意使用。

Often times, it is about managing expectations.很多时候,它是关于管理期望。

Let's start with a small thought experiment (or time travel back to the early days of computing), where there are no programming languages - just machine code.让我们从一个小的思想实验开始(或者时光倒流回到计算的早期),没有编程语言——只有机器代码。 There, you would (with CPU specific instructions) write something like this to represent a string:在那里,您将(使用特定于 CPU 的指令)编写如下内容来表示字符串:

arr: db 'a','b','c'
strlen:                         ; RDI (pointer to string) -> RAX (length of string)
                                ; RAX length counter and return value
                                ; CL used for null character test
        xor RAX, RAX            ; set RAX to 0
strlen_loop:
        mov cl, [rdi]           ; load CL with the byte pointed to by argument
        test cl,cl
        jz strlen_loop_done
        inc rdi                 ; look at next byte in argument
        inc rax                 ; increment the length counter
        jmp strlen_loop
strlen_loop_done:
        ret                     ; rax contains a zero terminated strings length

Compared to that, writing the same function in C is much simpler.相比之下,在 C 中写同样的 function 要简单得多。

  • We do not have to care about register allotment (which register does what).我们不必关心寄存器分配(哪个寄存器做什么)。
  • We do not rely on the instruction set of a specific CPU我们不依赖特定 CPU 的指令集
  • We do not have to look up the "calling conventions" or ABI for the target system (argument passing conventions etc)我们不必查找目标系统的“调用约定”或 ABI(参数传递约定等)
size_t strlen(const char* s) {
  size_t l = 0;
  while (*s) {
    l++;
    s++;
  }
  return l;
}

The convention, that "strings" are just pointers to chars (bytes) with the null value terminator is admittedly quite arbitrary but "comes" with the C programming language.约定,“字符串”只是指向带有 null 值终止符的字符(字节)的指针,这无疑是相当随意的,但 C 编程语言“附带”。 It is just a convention.这只是一个约定。 The compiler itself knows nothing about it (oh well it does know to add a terminating null on string literals).编译器本身对此一无所知(哦,它确实知道在字符串文字上添加终止 null)。 But when calling strlen() it cannot distinguish the string case from the just a byte array case.但是在调用strlen()时,它无法区分字符串大小写和字节数组大小写。 Why?为什么? because there is no specific string type.因为没有特定的字符串类型。

As such, it is just about as clever as the assembler code version I gave above.因此,它和我上面给出的汇编代码版本一样聪明。 It relies on the "c-string-convention".它依赖于“c-string-convention”。 The assembler does not check, nor does the C compiler, because - let's be honest, C's main accomplishments are the bullet items I gave above.汇编器不检查,C 编译器也不检查,因为 - 老实说,C 的主要成就是我上面给出的项目符号。

So if you manage your expectations, about the language C, think of it as: A slightly abstracted version of a glorified assembly language.因此,如果您管理您的期望,关于语言 C,请将其想象为:一种经过美化的汇编语言的稍微抽象的版本。

If you are annoyed about the c-string convention (after all, strlen is O(n) in time complexity), you can still come up with your own string type, maybe so:如果您对 c-string 约定感到恼火(毕竟strlen在时间复杂度上是O(n) ),您仍然可以想出自己的字符串类型,也许是这样:

typedef struct String_tag {
  size_t length;
  char data[];
} String_t;

And write yourself helpers (to create a string on the heap) and macros (to create a string on the stack with alloca or something).并编写自己的助手(在堆上创建一个字符串)和宏(用alloca或其他东西在堆栈上创建一个字符串)。 And write your own string feature library around that type.并围绕该类型编写您自己的字符串特征库。

If you are just getting started with C, instead of tackling something bigger, I think this would be a good exercise for learning the language.如果您刚刚开始使用 C,而不是处理更大的事情,我认为这将是学习语言的一个很好的练习。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM