简体   繁体   English

可被 3 整除的二进制数的正则表达式

[英]Regular Expression for Binary Numbers Divisible by 3

I am self-studying regular expressions and found an interesting practice problem online that involves writing a regular expression to recognize all binary numbers divisible by 3 (and only such numbers).我在自学正则表达式,在网上发现了一个有趣的练习题,就是写一个正则表达式来识别所有可以被 3 整除的二进制数(并且只有这样的数)。 To be honest, the problem asked to construct a DFA for such a scenario, but I figured that it should be equivalently possible using regular expressions.老实说,问题要求为这种情况构建 DFA,但我认为使用正则表达式应该等效。

I know that there's a little rule in place to figure out if a binary number is divisible by 3: take the number of ones in even places in the digit and subtract by the number of ones in odd places in the digit - if this equals zero, the number is divisible by 3 (example: 110 - 1 in the even 2 slot and a 1 in the odd 1 slot).我知道有一个小规则来确定一个二进制数是否可以被 3 整除:取数字中偶数位的 1 数,然后减去数字中奇数位的 1 数 - 如果这等于零,该数字可被 3 整除(例如:110 - 1 在偶数 2 插槽中,1 在奇数 1 插槽中)。 However, I'm having some trouble adapting this to a regular expression.但是,我在将其调整为正则表达式时遇到了一些麻烦。

The closest I've come is realizing that the number can be 0, so that would be the first state.我最接近的是意识到数字可以是 0,所以这将是第一个状态。 I also saw that all binary numbers divisible by 3 begin with 1, so that would be the second state, but I'm stuck from there.我还看到所有可被 3 整除的二进制数都以 1 开头,因此这将是第二种状态,但我被困在那里。 Could someone help out?有人可以帮忙吗?

Following what Oli Charlesworth says, you can build DFA for divisibility of base b number by a certain divisor d , where the states in the DFA represent the remainder of the division.按照 Oli Charlesworth 所说的,您可以构建 DFA 以将基数b数除以某个除数d ,其中 DFA 中的状态代表除法的余数。

For your case (base 2 - binary number, divisor d = 3 10 ):对于您的情况(基数 2 - 二进制数,除数d = 3 10 ):

初始 DFA

Note that the DFA above accepts empty string as a "number" divisible by 3. This can easily be fixed by adding one more intermediate state in front:请注意,上面的 DFA 接受空字符串作为可被 3 整除的“数字”。这可以通过在前面再添加一个中间状态来轻松解决:

固定 DFA

Conversion to theoretical regular expression can be done with the normal process .转换为理论正则表达式可以用正常的过程完成

Conversion to practical regex in flavors that supports recursive regex can be done easily, when you have got the DFA.获得 DFA 后,可以轻松转换为支持递归正则表达式的风格的实用正则表达式。 This is shown for the case of (base b = 10, d = 7 10 ) in this question from CodeGolf.SE.这在 CodeGolf.SE 的这个问题中的 (base b = 10, d = 7 10 ) 的情况下显示。

Let me quote the regex in the answer by Lowjacker , written in Ruby regex flavor:让我引用Lowjacker 的答案中的正则表达式,用 Ruby 正则表达式编写:

(?!$)(?>(|(?<B>4\g<A>|5\g<B>|6\g<C>|[07]\g<D>|[18]\g<E>|[29]\g<F>|3\g<G>))(|(?<C>[18]\g<A>|[29]\g<B>|3\g<C>|4\g<D>|5\g<E>|6\g<F>|[07]\g<G>))(|(?<D>5\g<A>|6\g<B>|[07]\g<C>|[18]\g<D>|[29]\g<E>|3\g<F>|4\g<G>))(|(?<E>[29]\g<A>|3\g<B>|4\g<C>|5\g<D>|6\g<E>|[07]\g<F>|[18]\g<G>))(|(?<F>6\g<A>|[07]\g<B>|[18]\g<C>|[29]\g<D>|3\g<E>|4\g<F>|5\g<G>))(|(?<G>3\g<A>|4\g<B>|5\g<C>|6\g<D>|[07]\g<E>|[18]\g<F>|[29]\g<G>)))(?<A>$|[07]\g<A>|[18]\g<B>|[29]\g<C>|3\g<D>|4\g<E>|5\g<F>|6\g<G>)

Breaking it down, you can see how it is constructed.分解它,您可以看到它是如何构建的。 The atomic grouping (or non-backtracking group, or a group that behaves possessively ) is used to make sure only the empty string alternative is matched.原子分组(或非回溯组,或具有所有格行为的组)用于确保仅匹配空字符串替代项。 This is a trick to emulate (?DEFINE) in Perl.这是在 Perl 中模拟(?DEFINE)的技巧。 Then the groups A to G correspond to remainder of 0 to 6 when the number is divided by 7.那么当数字除以7时,组AG对应于0到6的余数。

(?!$)
(?>
  (|(?<B>4   \g<A>|5   \g<B>|6   \g<C>|[07]\g<D>|[18]\g<E>|[29]\g<F>|3   \g<G>))
  (|(?<C>[18]\g<A>|[29]\g<B>|3   \g<C>|4   \g<D>|5   \g<E>|6   \g<F>|[07]\g<G>))
  (|(?<D>5   \g<A>|6   \g<B>|[07]\g<C>|[18]\g<D>|[29]\g<E>|3   \g<F>|4   \g<G>))
  (|(?<E>[29]\g<A>|3   \g<B>|4   \g<C>|5   \g<D>|6   \g<E>|[07]\g<F>|[18]\g<G>))
  (|(?<F>6   \g<A>|[07]\g<B>|[18]\g<C>|[29]\g<D>|3   \g<E>|4   \g<F>|5   \g<G>))
  (|(?<G>3   \g<A>|4   \g<B>|5   \g<C>|6   \g<D>|[07]\g<E>|[18]\g<F>|[29]\g<G>))
)
(?<A>$|  [07]\g<A>|[18]\g<B>|[29]\g<C>|3   \g<D>|4   \g<E>|5   \g<F>|6   \g<G>)

I have another way to this problem and I think this is easier to understand.我有另一种方法来解决这个问题,我认为这更容易理解。 When we are dividing a number by 3 we can have three remainders: 0, 1, 2. We can describe a number which is divisible by 3 using expression 3t ( t is a natural number).当我们将一个数除以 3 时,我们可以得到三个余数:0、1、2。我们可以使用表达式3tt是自然数)来描述一个可以被 3 整除的数。


When we are adding 0 after a binary number whose remainder is 0, the actual decimal number will be doubled.当我们在余数为 0 的二进制数后加 0 时,实际的十进制数将翻倍。 Because each digit is moving to a higher position.因为每个数字都在向更高的位置移动。 3t * 2 = 6t , this is also divisible by 3. 3t * 2 = 6t ,这也可以被 3 整除。

When we are adding a 1 after a binary number whose remainder is 0, the actual decimal number will be doubled plus 1. Because each digit is moving to a higher position followed by a 1;当我们在余数为 0 的二进制数后加 1 时,实际的十进制数会加倍加 1。 3t * 2 + 1 = 6t + 1 , the remainder is 1. 3t * 2 + 1 = 6t + 1 ,余数为1。


When we are adding a 1 after a binary number whose remainder is 1. The actual decimal number will be doubled plus one, and the remainder is 0;当我们在余数为1的二进制数后加1时,实际的十进制数会加一加一,余数为0; (3t + 1)*2 + 1 = 6t + 3 = 3(2t + 1) , this is divisible by 3. (3t + 1)*2 + 1 = 6t + 3 = 3(2t + 1) ,这可以被 3 整除。

When we are adding a 0 after a binary number whose remainder is 1. The actual decimal number will be doubled.当我们在余数为 1 的二进制数后加 0 时,实际的十进制数将被加倍。 And the remainder will be 2. (3t + 1)*2 = 6t + 2 .余数将是 2. (3t + 1)*2 = 6t + 2


When we are adding a 0 after a binary number whose remainder is 2. The remainder will be 1. (3t + 2)*2 = 6t + 4 = 3(2t + 1) + 1当我们在余数为 2 的二进制数后加 0 时,余数为 1。 (3t + 2)*2 = 6t + 4 = 3(2t + 1) + 1

When we are adding a 1 after a binary number whose remainder is 2. Then remainder will still be 2. (3t + 2)*2 + 1 = 6t + 5 = 3(2t + 1) + 2.当我们在余数为 2 的二进制数后加 1 时,余数仍为 2。 (3t + 2)*2 + 1 = 6t + 5 = 3(2t + 1) + 2.

No matter how many 1 you add to a binary number whose remainder is 2, remainder will be 2 forever.无论余数为 2 的二进制数加多少 1,余数永远为 2。 (3(2t + 1) + 2)*2 + 1 = 3(4t + 2) + 5 = 3(4t + 3) + 2


So we can have the DFA to describe the binary number:所以我们可以用 DFA 来描述二进制数: DFA 描述可被 3 整除的二进制数

Note: Edge q2 -> q1 should be labelled 0.注意:q2 -> q1应标记为 0。

Binary numbers divisible by 3 fall into 3 categories:可被 3 整除的二进制数分为 3 类:

  1. Numbers with two consecutive 1's or two 1's separated by an even number of 0's.有两个连续的 1 或两个 1 被偶数个 0 分隔的数字。 Effectively every pair "cancels" itself out.实际上,每一对都“取消”了自己。

(ex. 11, 110, 1100,1001,10010, 1111) (例如 11, 110, 1100,1001,10010, 1111)

(decimal: 3, 6, 12, 9, 18, 15) (十进制:3、6、12、9、18、15)

  1. Numbers with three 1's each separated by an odd number of 0's.三个 1 由奇数个 0 分隔的数字。 These triplets also "cancel" themselves out.这些三胞胎也“取消”了自己。

(ex. 10101, 101010, 1010001, 1000101) (例如 10101、101010、1010001、1000101)

(decimal: 21, 42, 81, 69) (十进制:21、42、81、69)

  1. Some combination of the first two rules (including inside one another)前两条规则的某种组合(包括彼此内部)

(ex. 1010111, 1110101, 1011100110001) (例如 1010111、1110101、1011100110001)

(decimal: 87, 117, 5937) (十进制:87、117、5937)

So a regular expression that takes into account these three rules is simply:因此,考虑到这三个规则的正则表达式很简单:

0*(1(00)*10*|10(00)*1(00)*(11)*0(00)*10*)*0* 0*(1(00)*10*|10(00)*1(00)*(11)*0(00)*10*)*0*

How to read it:如何阅读:

() encapsulate () 封装

* means the previous number/group is optional * 表示前一个号码/组是可选的

| | indicates a choice of options on either side within the parentheses表示括号内任一侧的选项选择

The problem you're encountering is that whilst your trick is (probably) valid, it doesn't map to a practical DFA (you have to track a potentially arbitrary difference between the number of even and odd ones, which would require an arbitrary number of states).您遇到的问题是,虽然您的技巧(可能)是有效的,但它并没有映射到实际的 DFA(您必须跟踪偶数和奇数之间的潜在任意差异,这需要任意数字状态)。

An alternative approach is to note that (working from MSB to LSB) after the i -th character , x[i] , your substring must either be equal to 0, 1, or 2 in modulo-3 arithmetic;另一种方法是注意(从 MSB 到 LSB)在第i个字符x[i] ,您的子字符串在模 3 算术中必须等于 0、1 或 2; call this value S[i] .将此值称为S[i] x[i+1] must be either 0 or 1, which is equivalent to multiplying by 2 and optionally adding 1. x[i+1]必须是 0 或 1,这相当于乘以 2 并可选地加 1。

So if you know S[i] and x[i+1] , you can calculate S[i+1] .因此,如果您知道S[i]x[i+1] ,则可以计算S[i+1] Does that description sound familiar?这个描述听起来是不是很熟悉?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM