简体   繁体   English

带左移和校验字符串的位操作

[英]Bit manipulation with left shift and check string

I am working on some code checking the repeat of character in a string. 我正在研究一些代码来检查字符串中字符的重复。 Here is some answer I found somewhere. 这是我在某处找到的一些答案。

int checker = 0, val =0, max = 0, j =0, count = 0;
        for(int i=0; i<s.size() && j<s.size(); i++)
        {
            j = i;
            while(j<s.size())
            {
                val = s[j]-'a';
                if ((checker & (1<<val)) >0) break;
                checker |= 1 << val;
                j++;
                count++;
            }
            if(count > max) max = count;
            checker = 0;
            count = 0;
        }
        return max;

The method is clear and straight forward. 该方法明确而直接。 However, I am confused at two lines. 但是,我对两行感到困惑。

        val = s[j]-'a';
        if ((checker & (1<<val)) >0) break;
        checker |= 1 << val;

What I don't know is that val is some value after subtraction. 我不知道val在减去后是否有价值。 Then (1 << val) is 1 left shift val, my understanding is 1*2^(val) . 那么(1 << val)1左移val,我的理解是1*2^(val) Then 1 << val needs to =1 to jump out of loop. 然后1 << val需要=1才能跳出循环。 But how was it achieved please? 但是,请问是如何实现的呢? Thanks. 谢谢。

Let's break it down line-by-line. 让我们逐行将其分解。

val = s[j]-'a';

This is a nifty trick that will convert any character in the range 'a'->'z' to a number 0-25 . 这是一个巧妙的技巧,它将把'a'->'z'范围内'a'->'z'任何字符转换为数字0-25 You actually usually see this as s-'0' to convert a digit-character to a number, but it works just as well for letters. 实际上,您通常将其视为s-'0'以将数字字符转换为数字,但是它对于字母也同样有效。 It leverages the fact that in the ASCII/UTF8 character space the alphabetic letters are continuous, so if you treat a character as a number and subtract the starting letter, you get the 'offset' of the character with 'a' being 0 and 'z' being 25. 它利用了以下事实:在ASCII / UTF8字符空间中,字母是连续的,因此,如果将字符视为数字并减去起始字母,则会得到字符的“偏移”,其中'a'0'z'是25。

if ((checker & (1<<val)) >0) break;

The key here is to understand what 1<<val will do. 关键是要了解1<<val会做什么。 This left-shifts a single 1 bit val bits over. 左移单个1val位。 So for 'a' you'd get 0b1 , for 'b' you'd get '0b10' , and so on. 因此,对于'a'您将获得0b1 ;对于'b'您将得到'0b10' ,依此类推。 Effectively, it one-hot encodes a letter to a bit in a 32bit integer. 有效地,它一次热编码一个字母到一个32位整数中的一位。 If we then & this whith our checker value, which records the same one-hot bitfield of letters we've already seen, the resulting value will be >0 if and only if checker contained a 1 in the bit representing the letter. 如果我们再&这个蒙山我们的checker值,它记录的,我们已经看到信件的同一个热位字段,所产生的价值将是>0当且仅当checker含有1在代表字母位。 If that's the case, we've found a duplicate, So we break. 如果是这样,我们发现了一个重复项,所以我们中断了。

checker |= 1 << val;

If we've gotten here, it means checker didn't contain a 1 in the bit for that letter. 如果我们到了这里,这意味着checker的那个字母中没有1 So we've now seen this letter, and need to update checker . 因此,我们现在已经看到了这封信,并且需要更新checker |= 'ing it with the val from before will always set exactly that single bit to 1 , while leaving any other bits unchanged. |=从之前的val进行修改将始终将该单个位精确设置为1 ,而其他所有位均保持不变。

Piece by piece: 逐段:

Set val to the current character - 'a' , that means, 'a' gives 0 , 'z' 25 设置val当前字符- 'a'这意味着, 'a'0'z' 25

val = s[j]-'a';

Check the bit in checker: if the bit val is already set in the checker , then break. 检查检查了一下:如果该位val在已经设置checker ,再突破。 This works by logically anding the value against the bitmask; 这是通过将值与位掩码进行逻辑和运算来实现的; if the bit is set, the value should be positive (assumptions, assumptions). 如果该位置1,则该值应为正(假设,假设)。

if ((checker & (1<<val)) >0) break;

Else set the bit val to 1 by orring it. 否则通过对bit val进行设置来将其设置为1。

checker |= 1 << val;

The code makes lots of assumptions; 该代码有很多假设; for example int needs to have at least 26 bits, and characters outside 'a' - 'z' in the string could cause undefined behaviour. 例如int至少需要26位,并且字符串中'a'-'z'之外的字符可能会导致不确定的行为。

The code's author is using the variable 'checker' as a bit mask to remember which characters he has already seen. 代码的作者使用变量“ checker”作为位掩码来记住他已经看到的字符。 The line: 该行:

val = s[j] - 'a';

is normalizing the ASCII value of the character s[j] down by the ASCII value of 'a'. 正在将字符s [j]的ASCII值向下按ASCII值“ a”进行归一化。 Basically, he is figuring out which letter of the alphabet this character is in the range [0, 25] for lower case alpha characters: a is 0, b is 1, c is 2 and so on. 基本上,他正在弄清楚此字符在小写字母字符的范围[0,25]中是哪个字母:a为0,b为1,c为2,依此类推。

He is then checking if this bit is already set in 'checker' or not. 然后,他正在检查此位是否已在“检查器”中设置。 He does this by left shifting 1 by the normalized value and AND'ing it with 'checker.' 他通过左移1归一化的值并将其与“ checker”进行“与”操作来实现。 If that bit is not set in 'checker', then the bit-wise AND will return zero and the loop will continue. 如果未在“检查器”中设置该位,则按位与将返回零,并且循环将继续。 If it is set, then the AND will return non-zero and his test will break the loop. 如果已设置,则AND将返回非零值,并且他的测试将中断循环。

When the bit is not set, he is then setting that bit in 'checker' that corresponds to that position. 如果未设置该位,则他将在“ checker”中设置与该位置相对应的位。 If the character was 'a' then the least significant bit is set, 'b' then the second least significant bit is set and so on by bitwise OR'ing 1 left-shifted by 'val' with the existing 'checker'. 如果字符是“ a”,则设置最低有效位,设置“ b”,然后设置第二个最低有效位,依此类推,方法是将现有的“ checker”按左移1个“ val”进行按位“或”运算。

PS - He could have just as easily made 'checker' be an array of 26 characters and done: PS-他可以很容易地将“ checker”设置为26个字符的数组并完成:

char checker[26] = { 0 };
...
    while(j < s.size() && !checker[s[j] - 'a'])
    {
        checker[s[j] - 'a'] = 1;
        ++j;
        ++count;
    }
...

I'm sure you would have understood that. 我相信你会理解的。 That's basically what he is doing but is stuffing the array into a bit mask instead using some bit manipulation. 基本上这就是他正在做的,但是正在将数组填充到位掩码中,而不是使用一些位操作。 That way he can also easily clear the set bits simply by setting checker to zero. 这样,他还可以简单地通过将Checker设置为零来轻松清除设置的位。

The funny piece of code you show us takes a few assumptions: 您展示给我们的有趣的代码有一些假设:

  1. The string s only contains lower case letters ('a'..'z'). 字符串s仅包含小写字母('a'..'z')。
  2. The type int has 32 bits (or more). int类型具有32位(或更多)。

What the code does is to set a bit in the checker variable for each character it found so far (26 lower case characters fits in to some 31/32 bit int, 1 bit being associated with one character). 代码要做的是为到目前为止找到的每个字符在checker变量中设置一个位(26个小写字符适合一些31/32位int,其中1个位与一个字符相关联)。 He had better used some uint32_t, btw. 他最好使用一些uint32_t,顺便说一句。

By subtracting 'a' from the current character he gets values (0..25) if his string holds assumption 1. 通过从当前字符中减去“ a”,如果他的字符串符合假设1,他将获得值(0..25)。

The if() expression tests if that bit has been set before, ie if the character occured before. if()表达式测试该位是否已经设置过,即该字符是否在之前设置过。

No matter which bit is set in checker, it is != 0. And if assumption 1 holds, it is always > 0. (no way to reach bit 31, which is the sign bit.) 无论在checker中设置的哪个位,它都是!=0。并且如果假设1成立,则它始终>0。(无法到达符号位bit31。)

Every bit of checker starting from right to left is marked for every character found. 从找到的每个字符都标记了从右到左开始的每一个检查器。 Lets say if there is b found in the string then second bit from right is set.. And if its c then it's the third bit... And this checker bitmask is used for matching subsequent characters. 可以说如果在字符串中找到b,则设置右边的第二位。如果它的c,则它是第三位...并且此checker位掩码用于匹配后续字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM