简体   繁体   English

正则表达式匹配一个数字后跟一个重复多次的符号?

[英]Regular expression to match a number followed by a symbol repeated that many times?

How can I create a RegEx that can match the following: 如何创建可以匹配以下内容的RegEx:

a3bbb
aaaa3bbb
a4bbbb
aaa5bbbbb

Ie, a (one or more times), then a non-negative number, then b repeated 'that many times' (as many as the number between a and b ). 即, a (一次或多次),然后是非负数,然后b '重复多次'(与ab之间的数字一样多)。

Is this language regular? 这种语言有规律吗? If not, can we construct a CFG for this? 如果没有,我们可以为此构建一个CFG吗?

Edit: As for whether the number is single digit, I would say no. 编辑:至于数字是否是单个数字,我会说不。 (also as Daniel Centore and rici point out, the language is not even CF. Then the natural question is, is it context-sensitive or unrestricted?) (也正如Daniel Centore和rici指出的那样,语言甚至不是CF.那么自然的问题是,它是上下文敏感的还是不受限制的?)

Like other answers have said, if the number is unbounded, the language is neither regular (if it's regular, pumping lemma says for a sufficiently long string, the b 's could be extended indefinitely independent of the number) nor context-free (if it's context-free, pumping lemma says for a sufficiently long number, the number and the b 's could be repeated, but not correctly). 就像其他答案所说的那样,如果数字是无界的,那么语言既不规律(如果它是常规的,抽取引理说的是一个足够长的字符串, b可以无限延长,也可以不受数字限制)也不是无上下文的(如果它没有上下文,抽取引理说数量足够长,数字和b可以重复,但不正确)。

But the language is context-sensitive, as it can be generated using the following grammar (I do it for base-3 number for simplicity, you can extend to base 10): 但是语言是上下文敏感的,因为它可以使用以下语法生成(为简单起见我为base-3编号,你可以扩展到基数10):

(1) S -> aS | aB
(2) B -> BN | N
(3) aN -> a0 | a1b | a2bb
(4) 0N -> 00 | 01b | 02bb
(5) 1N -> 10 | 11b | 12bb
(6) 2N -> 20 | 21b | 22bb
(7) bN -> WN
(8) WN -> WX
(9) WX -> NX
(10)NX -> Nbbb

Rule (1) is to generate the a 's 规则(1)是生成a

Rule (2) is to generate each digit in the number 规则(2)是生成数字中的每个数字

Rule (3)-(6) is to replace the left-most N with a number and respective number of b 's. 规则(3) - (6)是用最大数量和b的数量替换最左边的N

Rule (7)-(10) is to have the N "consume" the b 's to its left, and produce 3 b 's (10 b 's in base-10). 规则(7) - (10)是让N “消耗”左边的b ,并产生3 b (在10的基数为10 b )。 Technically (7)-(10) is just bN -> Nbbb . 技术上(7) - (10)只是bN -> Nbbb

Example: 例:

To generate: a102bbbbbbbbbbb (102 in base-3 = 11 in base-10)
S
aB (1b)
aBN (2a)
aBNN (2a)
aNNN (2b)
a1bNN (3b)
a1NbbbN (7)-(10)
a1NbbNbbb (7)-(10)
a1NbNbbbbbb (7)-(10)
a1NNbbbbbbbbb (7)-(10)
a10Nbbbbbbbbb (5a)
a102bbbbbbbbbbb (4c)

This language is not regular (and thus cannot be expressed as a RegEx). 这种语言不规则(因此不能表示为RegEx)。 One test for language regularity is to check if it can be expressed by a Finite Automaton. 对语言规律性的一个测试是检查它是否可以由有限自动机表示。 It can be shown that this language cannot be expressed as an FA because the FA would need at least as many states as the number between a and b , but that number is not bounded. 可以证明,该语言不能表示为FA,因为FA至少需要与ab之间的数字一样多的状态,但该数字不受限制。 However, if it is bounded ( ex the number can only be from 1-10) then it would be Regular. 但是,如果它是有界的( 例如 ,数字只能是1-10)那么它将是常规的。

The language also cannot be expressed as a CFG, which can probably be shown using the pumping lemma. 该语言也不能表示为CFG,可以使用泵浦引理来表示。

If the number is a single digit, then the language is regular (because you can just list the nine possible suffixes). 如果数字是一个数字,那么语言是常规的(因为你可以只列出九个可能的后缀)。 But if the number is not bounded, the language not regular. 但如果数字不受限制,语言就不规律了。 It is not even context-free. 它甚至没有上下文。 So neither a regular expression nor a CFG are available. 因此,正则表达式和CFG都不可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM