简体   繁体   English

理解strlen实现中的代码

[英]Understanding code in strlen implementation

I have two questions regarding the implementation of strlen in string.h in glibc. 关于glibc中string.hstrlen的实现,我有两个问题。

  1. The implementation uses a magic number with 'holes'. 该实现使用带有“洞”的幻数。 I am not able to understand how this works. 我无法理解这是如何工作的。 Can someone please help me understand this snippet: 有人可以帮我理解这个片段:

     size_t strlen (const char *str) { const char *char_ptr; const unsigned long int *longword_ptr; unsigned long int longword, himagic, lomagic; /* Handle the first few characters by reading one character at a time. Do this until CHAR_PTR is aligned on a longword boundary. */ for (char_ptr = str; ((unsigned long int) char_ptr & (sizeof (longword) - 1)) != 0; ++char_ptr) if (*char_ptr == '\\0') return char_ptr - str; /* All these elucidatory comments refer to 4-byte longwords, but the theory applies equally well to 8-byte longwords. */ longword_ptr = (unsigned long int *) char_ptr; /* Bits 31, 24, 16, and 8 of this number are zero. Call these bits the "holes." Note that there is a hole just to the left of each byte, with an extra at the end: bits: 01111110 11111110 11111110 11111111 bytes: AAAAAAAA BBBBBBBB CCCCCCCC DDDDDDDD The 1-bits make sure that carries propagate to the next 0-bit. The 0-bits provide holes for carries to fall into. */ himagic = 0x80808080L; lomagic = 0x01010101L; if (sizeof (longword) > 4) { /* 64-bit version of the magic. */ /* Do the shift in two steps to avoid a warning if long has 32 bits. */ himagic = ((himagic << 16) << 16) | himagic; lomagic = ((lomagic << 16) << 16) | lomagic; } if (sizeof (longword) > 8) abort (); /* Instead of the traditional loop which tests each character, we will test a longword at a time. The tricky part is testing if *any of the four* bytes in the longword in question are zero. */ for (;;) { longword = *longword_ptr++; if (((longword - lomagic) & ~longword & himagic) != 0) { /* Which of the bytes was the zero? If none of them were, it was a misfire; continue the search. */ const char *cp = (const char *) (longword_ptr - 1); if (cp[0] == 0) return cp - str; if (cp[1] == 0) return cp - str + 1; if (cp[2] == 0) return cp - str + 2; if (cp[3] == 0) return cp - str + 3; if (sizeof (longword) > 4) { if (cp[4] == 0) return cp - str + 4; if (cp[5] == 0) return cp - str + 5; if (cp[6] == 0) return cp - str + 6; if (cp[7] == 0) return cp - str + 7; }}} 

    What is the magic number being used for? 用于的神奇数字是多少?

  2. Why not simply increment pointer until NULL character and return count? 为什么不简单地将指针递增到NULL字符并返回计数? Is this approach faster? 这种方法更快吗? Why is it so? 为什么会这样?

This is used to look at 4 bytes (32 bits) or even 8 (64 bits) in one go, to check if one of them is zero (end of string), instead of checking each byte individually. 这用于一次查看4个字节(32位)或甚至8个(64位),以检查其中一个是否为零(字符串结束),而不是单独检查每个字节。

Here is one example to check for a null byte: 以下是检查空字节的一个示例:

unsigned int v; // 32-bit word to check if any 8-bit byte in it is 0
bool hasZeroByte = ~((((v & 0x7F7F7F7F) + 0x7F7F7F7F) | v) | 0x7F7F7F7F);

For some more see Bit Twiddling Hacks . 对于更多人来说,看看Bit Twiddling Hacks

The one used here (32-bit example): 这里使用的那个(32位示例):

There is yet a faster method — use hasless(v, 1), which is defined below; 还有一种更快的方法 - 使用hasless(v,1),定义如下; it works in 4 operations and requires no subsquent verification. 它适用于4个操作,不需要后续验证。 It simplifies to 它简化为

#define haszero(v) (((v) - 0x01010101UL) & ~(v) & 0x80808080UL)

The subexpression (v - 0x01010101UL), evaluates to a high bit set in any byte whenever the corresponding byte in v is zero or greater than 0x80. 子表达式(v - 0x01010101UL),只要v中的相应字节为零或大于0x80,就会在任何字节中设置为高位设置。 The sub-expression ~v & 0x80808080UL evaluates to high bits set in bytes where the byte of v doesn't have its high bit set (so the byte was less than 0x80). 子表达式~v&0x80808080UL评估为以字节为单位设置的高位,其中v的字节没有设置其高位(因此字节小于0x80)。 Finally, by ANDing these two sub-expressions the result is the high bits set where the bytes in v were zero, since the high bits set due to a value greater than 0x80 in the first sub-expression are masked off by the second. 最后,通过对这两个子表达式进行AND运算,结果是高位设置,其中v中的字节为零,因为由于第一个子表达式中大于0x80的值而设置的高位被第二个子表达式屏蔽掉。

Looking at one byte at a time costs at least as much cpu cycles as looking at a full interger value (register wide). 一次查看一个字节的成本至少与查看完整的整数值(寄存器宽)一样多。 In this algorithm, full integers are checked to see if they contain a zero. 在此算法中,检查完整整数以查看它们是否包含零。 If not, little instructions are used, and a jump can be made to the next full integer. 如果没有,则使用很少的指令,并且可以跳转到下一个完整的整数。 If there is a zero byte inside, a further check is done to see at what exact position it was. 如果内部有一个零字节,则进一步检查以查看它的确切位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM