是否可以编写Perl / Java / etc正则表达式来匹配十进制（非）素数？

Question

Related questions/material: 相关问题/材料：

How can we match a^nb^n with Java regex? 我们如何匹配^ nb ^ n与Java正则表达式？
How to determine if a number is a prime with regex? 如何确定一个数字是否是正则表达式的素数？ (which deals with unary prime matching, while I'm looking for base ≥ 2; a nice trick nevertheless, and what got me to think about this) （它涉及一元素数匹配，而我正在寻找≥2的基数;不过是一个很好的伎俩，是什么让我思考这个）
http://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html http://nikic.github.io/2012/06/15/The-true-power-of-regular-expressions.html

As is well known, the "regular expressions" supported by various programming languages generate languages that are non-regular in the formal sense and, as demonstrated in the above material, able to recognize at least some context sensitive languages. 众所周知，由各种编程语言支持的“正则表达式”生成在形式意义上非常规的语言，并且如上述材料所示，能够识别至少一些上下文敏感语言。

The language L = {x | 语言L = {x | x is a prime number in base 10} is a context-sensitive language, since primality can be tested by a linear bounded automaton (but it is not a context-free language by a pumping lemma argument). x是基数10中的素数。}是一种上下文敏感语言，因为素数可以通过线性有界自动机来测试（但它不是通过泵浦引理参数的无上下文语言）。

So, is it possible to write a Perl or Java regular expression which accepts precisely all prime numbers in base 10? 那么，是否可以编写一个Perl或Java正则表达式，它恰好接受基数为10的所有素数？ Feel free to substitute any other base ≥ 2 or to recognize precisely all composite numbers if that feels easier. 如果感觉更容易，可以随意替换≥2的任何其他基础或精确识别所有复合数字。

Using escapes to, say, run arbitrary Perl code is considered cheating. 例如，使用转义来运行任意Perl代码被视为作弊。 Doing repeated substitutions (which is easily Turing complete) is also out of scope; 重复替换（很容易图灵完成）也超出了范围; the entire work should be done inside the regular expression. 整个工作应该在正则表达式内完成。 This question is more about the boundaries of how powerful regular expressions actually are. 这个问题更多的是关于正则表达式实际有多强大的界限。

Answer 1

NOTE: These Regexes where written in for PHP and use possessive quantifiers which are used in many but not all languages, for example java-script does not support them. 注意：这些正则表达式用于PHP编写并使用占有量词，这些量词在许多但不是所有语言中使用，例如java-script不支持它们。 Also this is very inefficient and will quickly become infeasible. 这也是非常低效的，很快就会变得不可行。

EDIT: here it is for base 10 \\b(((\\d)(?=[\\d\\s]*(\\4{0,10}(n(?=.*n\\3)|nn(?=.*1\\3)|n{3}(?=.*2\\3)|n{4}(?=.*3\\3)|n{5}(?=.*4\\3)|n{6}(?=.*5\\3)|n{7}(?=.*6\\3)|n{8}(?=.*7\\3)|n{9}(?=.*8\\3))?)))+)(?![\\d\\s]*(n(?=\\4))++(..?1|(...*)\\8+1)) I have used base 2 after this to make things easier. 编辑：这里是基数10 \\b(((\\d)(?=[\\d\\s]*(\\4{0,10}(n(?=.*n\\3)|nn(?=.*1\\3)|n{3}(?=.*2\\3)|n{4}(?=.*3\\3)|n{5}(?=.*4\\3)|n{6}(?=.*5\\3)|n{7}(?=.*6\\3)|n{8}(?=.*7\\3)|n{9}(?=.*8\\3))?)))+)(?![\\d\\s]*(n(?=\\4))++(..?1|(...*)\\8+1))之后我使用了base 2来简化操作。

EDIT: this one will allow you to pass in a string containing several binary numbers and matches those that are prime \\b(((\\d)(?=[\\d\\s]*(\\4{0,2}n(?=.*\\3)|\\4{0,2})))+)(?![\\d\\s]*(n(?=\\4))++(..?1|(...*)\\7+1)) It basically does this by using boundary \\b instead of start of string ^, it allows any number of decimals and spaces when moving forward to the ns and wraps the whole of the portion that tests the base 1 representations in a negative look-ahead. 编辑：这个将允许您传入一个包含几个二进制数字的字符串，并匹配素数\\b(((\\d)(?=[\\d\\s]*(\\4{0,2}n(?=.*\\3)|\\4{0,2})))+)(?![\\d\\s]*(n(?=\\4))++(..?1|(...*)\\7+1))它基本上是通过使用边界\\ b而不是字符串^的开头来实现的，它允许任何数量的小数和空格向前移动到ns并包裹测试基础的整个部分负面预测中的1个表示。 Apart from that it work in the same way as the one below. 除此之外，它的工作方式与下面的相同。 As an example 1111 1011 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn1 will match 1011 . 例如1111 1011 nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn1匹配1011 。

I have managed to get something I think works (checked to 25) and it matches non-primes. 我设法得到了我认为有用的东西（检查到25）并且它匹配非素数。 Here it is for base 2 (easier to explain) ^((\\d)(?= \\d*\\s(\\3{0,2}n(?=.*\\2)|\\3{0,2})))+\\s(n(?=\\3))*+\\K(..?1|(..+?)\\6+1) this can be expanded up to base n, but this expands the regex very quickly. 这里是基数2（更容易解释） ^((\\d)(?= \\d*\\s(\\3{0,2}n(?=.*\\2)|\\3{0,2})))+\\s(n(?=\\3))*+\\K(..?1|(..+?)\\6+1)这可以扩展到基数n，但这会扩展正则表达式很快。 To get this regex to work I need a couple of perquisites (A bit hacky), the input string must be the number followed by a space followed by at least as many n characters as the value you of your number (if you had the number 10 you need at leat 10 ns after it) followed by the digits of your base in order excluding your 0 digit (eg for base 10 123456789), not including your 0. For example 11 nnnnnnnnnnnnnn1 . 要使这个正则表达式工作，我需要几个条件（有点hacky），输入字符串必须是数字后跟一个空格，后跟至少与你的数字的值一样多的n个字符（如果你有这个数字） 10你需要在10毫秒后的皮带上），然后是你的基数位，不包括你的0位数（例如基数10 123456789），不包括你的0.例如11 nnnnnnnnnnnnnn1 。 This is due to the fact that the regexes have no accessible storage so I need to use capturing groups to do this. 这是因为正则表达式没有可访问的存储空间，所以我需要使用捕获组来执行此操作。 Finally this regex uses /x to ignore whitespaces in the expression, strip out all the space if you don't want to use this. 最后，这个正则表达式使用/ x来忽略表达式中的空格，如果你不想使用它，则删除所有空格。

I will now explain how this works in 3 steps. 我现在将通过3个步骤解释它是如何工作的。 This regex works in 3 parts: 这个正则表达式分为3部分：

Part 1 :this part changes a base n > 1 to base 1 as a capturing group of ns 第1部分 ：此部分将基数n> 1更改为基数1作为ns的捕获组

This is the part ^((\\d)(?= \\d*\\s(\\3{0,2}n(?=.*\\2)|\\3{0,2})))+ it works very similarly to the a^nb^n example in the question. 这是部分^((\\d)(?= \\d*\\s(\\3{0,2}n(?=.*\\2)|\\3{0,2})))+它非常有效类似于问题中的a^nb^n示例。 The ^ at the front means that the full match has to start at the beginning this is important for later. 前面的^表示完全匹配必须从头开始，这对以后很重要。 The main structure of this code is ^((\\d)(?= \\d*\\s (suff)))+ This takes each decimal between the start and the first space and performs a positive look-ahead using (\\d)(?=) where \\d is a decimal and (?=) is a look-ahead the \\d is in a capturing group () for later on. 这段代码的主要结构是^((\\d)(?= \\d*\\s (suff)))+这将获取起始空间和第一个空格之间的每个小数，并使用（\\ d）执行正向^((\\d)(?= \\d*\\s (suff)))+ （？=）其中\\ d是十进制数，（？=）是\\ d是在捕获组()中以后的预测。 It is the digit we are currently looking at. 这是我们目前正在关注的数字。

The inside of the look-ahead is not actually to check a condition ahead but instead to build up a capturing group representing our number in base 1. The inside of the capturing group looks like this 前瞻的内部实际上并不是要检查前面的状况，而是建立一个表示我们在基数1中的数字的捕获组。捕获组的内部看起来像这样

\d*\s(\3{0,2}n(?=.*\2)|\3{0,2}))

The part \\d*\\s basically moves the characters we are looking at past the rest of the remaining digits \\d* (\\d is digit and * is 0 to n as many times as possible) this now leaves us looking at the start of the ns. 部分\\ d * \\ s基本上将我们正在查看的字符移动到其余的剩余数字\\ d *（\\ d是数字，*是0到n尽可能多的次数）这现在让我们看着开始的ns。

(\3{0,2}n(?=.*\2)|\3{0,2}))

is a self referencing capturing group here is where the need for the digits you have put at the end comes in. This group matches itself 0 to 2 times but as many times as possible (using \\3{0,2} with \\3 meaning caturing group 3 and {0,2} meaning match from 0 to 2 times) this means that if there is a number before the current digit its base 1 representation is multiplied by 2. This would be 10 for base 10 or 16 for base 16. If this is the first digit the group will be undefined so it will match 0 times. 是一个自引用捕获组，这里是你最后输入数字的需要。这个组匹配自己0到2次，但尽可能多次（使用\\ 3 {0,2}与\\ 3含义caturing group 3和{0,2}意味着匹配从0到2次）这意味着如果在当前数字之前有一个数字，则其基数1表示乘以2.对于基数10，这将是10，对于基数16，则为16如果这是第一个数字，那么该组将是未定义的，因此它将匹配0次。 It then either adds a single n or no n based on matching the digit we are currently working on (using its capturing group). 然后根据我们当前正在处理的数字（使用其捕获组）添加单个n或不添加n。 It does this by using a positive look ahead to look to the end of the input where we put the digits, n(?=.*\\2) this matches n if it can find anything followed by the digit we are working on. 它通过使用正向前看来查看我们放置数字的输入的结尾，n（？=。* \\ 2）这匹配n，如果它可以找到任何后面跟我们正在处理的数字。 This allows it to identify what digit it is we are working on at this point. 这使它能够识别我们正在处理的数字。 (I would have used a look behind but they are fixed length) If you had base 3 and wanted to check if the digit you are currently working on is 2 you would use nn(?=.*1\\2) this would match nn only if the digit was two. （我会用后面的看看，但它们是固定的长度）如果你有基数3并想检查你当前正在使用的数字是2你将使用nn（？=。* 1 \\ 2）这将匹配nn只有数字是两个。 We have used an or operator '|' 我们使用了一个或运算符'|' for all of these and if no digit is found we assume it is 0 and add no ns. 对于所有这些，如果没有找到数字，我们假设它是0并且不添加ns。 As this is in the capturing group this matching is then saved in the group. 由于这是在捕获组中，因此该匹配将保存在组中。

In summary of this part what we do is take each digit look ahead take the base 1 representation of the previous digits (saved in the capturing group )and multiply it by the base then match it, then add to it the base one representation of the digit and save it in the group. 在这部分的总结中，我们所做的是将每个数字向前看取前面数字的基数1表示（保存在捕获组中）并将其乘以基数然后匹配它，然后将其添加到基数表示数字并将其保存在组中。 If you do this for each digit in turn you will get a base one representation of the number. 如果您依次为每个数字执行此操作，您将获得该数字的基本表示。 Lets look at and example. 让我们看看和例子。 101 nnnnnnnnnnnnnnnnn1 101 nnnnnnnnnnnnnnnnn1n

First it goes to the sat because of ^. 首先它因为^而进入了sat。 so :101 nnnnnnnnnnnnnnnnn1 所以：101 nnnnnnnnnnnnnnnnnn1n

Then it goes to the first digit and saves it in a capturing grope 1 01 nnnnnnnnnnnnnnnnn1 然后它转到第一个数字并将其保存在捕获摸索中1 01 nnnnnnnnnnnnnnnnnn

Group2 : 1 第2组：1

It uses a look-ahead using \\d*\\s to go past all the digits and the first space. 它使用\\ d * \\ s来超越所有数字和第一个空格。 1 01 n nnnnnnnnnnnnnnnn1 1 01 n nnnnnnnnnnnnnnnnnnnnnnnnnnn

It is now inside capturing group 3 现在它正在捕获第3组

It takes this caputing group's previous value and matches it 0 to 2 times 它需要此caputing组的先前值并将其匹配0到2次

As it is undefined it matches 0 time. 因为它是未定义的，它匹配0时间。

It now looks ahead again to try to find a digit matching the digit in capturing group 2 1 01 n nnnnnnnnnnnnnnnn 1 现在看起来再度领先，试图找到一个数字相匹配的数字捕获组2 1 01ñnnnnnnnnnnnnnnnn 1

as it has been found it matches 1 n in capturing group 3 2 1 01 nn nnnnnnnnnnnnnnn1 因为已经发现它在捕获组3中匹配1 n 2 1 01 nn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

It now leaves group 3, updating its value and leaves the look ahead Group 3 = n 现在它离开了第3组，更新了它的值，并使前瞻性的第3组= n

It now looks at the next digit and saves that in a capturing group 1 0 1 nnnnnnnnnnnnnnnnn1 它现在查看下一个数字并将其保存在捕获组中1 0 1 nnnnnnnnnnnnnnnnnnn

group 2 = 0 组2 = 0

group 3 = n 第3组= n

It then also uses a look-ahead and goes to the first n 1 0 1 n nnnnnnnnnnnnnnnn1 然后它也使用前瞻并转到第一个n 1 0 1 n nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

It then matches group 3 0 to 2 time but as many as possible so n 1 0 1 nn nnnnnnnnnnnnnnn1 然后它匹配组3 0到2时间，但尽可能多，所以n 1 0 1 nn nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn

It then uses a look-ahead to try to match the digit in group 2 which it can do so it adds no ns, befor returning out of group 3 and the look-ahead 然后它使用前瞻来尝试匹配组2中的数字，它可以这样做，因此它不会添加任何ns，返回组3和前瞻

group3 = nn group3 = nn

It now looks at the next digit and saves it in group 2 10 1 nnnnnnnnnnnnnnnnn1 Using a look-ahead it goes to the start of the ns and matches 2 times group 3 10 1 nnnn nnnnnnnnnnnnn1 It then uses a look-ahead to try to match the digit in group 2 it finds it so matches an and returns out of group 3 and the look-ahead. 它现在查看下一个数字并将其保存在第2组中10 1 nnnnnnnnnnnnnnnnnn使用预测它将转到ns的开头并匹配2次第3组10 1 nnnn nnnnnnnnnnnnn然后使用预测来尝它找到的第2组中的数字匹配a并返回第3组和前瞻。 group3 = nnnnn Group 3 now contains the base 1 representation of our number. group3 = nnnnn第3组现在包含我们号码的基数1表示。

Part 2 Reduces the ns to the size of the base 1 representation of your number 第2部分将ns减小到您的数字的基数1表示的大小

\s(n(?=\3))*+\K

This matches the space and then matches ns for as long as you can match group 3 (the base one representation of your number) in front. 这匹配空格然后匹配ns，只要您可以匹配前面的组3（您的号码的基本表示）。 It does this by matching n as many times as possible using a *+ which is possessive (it never lets go of a matching this is to stop the matching from being shrunk later to make a match work) n has a posive look-ahead n(?=\\3) which means n will be matched as long as there is a group 3 ahead of it (\\3 gives capturing group 3). 它通过使用占有的*来尽可能多地匹配n来做到这一点（它永远不会让匹配这是为了阻止匹配从以后缩小以使匹配工作）n有一个正面的预测n （？= \\ 3）这意味着只要在它前面有一个组3，n就会匹配（\\ 3给出捕获组3）。 This leaves us with our base 1 representation and digits being the only thing left unmatched. 这使我们得到了我们的基数1表示，数字是唯一不可比拟的东西。 We then us \\K to say start the matching again from here. 我们然后我们\\ K来说从这里再次开始匹配。

Part3 We now use the same algorithm mentioned in the question to get primes apart from we force it not to match between the start of the base on representation and the start of the digits. 第3部分我们现在使用问题中提到的相同算法来获得素数，除了我们强制它在表示的基数的开始和数字的开始之间不匹配。 You can read how that works Here How to determine if a number is a prime with regex? 你可以阅读它是如何工作的在这里如何确定一个数字是否是正则表达式的素数？

Finally to make this into a base n regex you have to do a few things 最后，为了使它成为基础正则表达式，你必须做一些事情

 ^((\d)(?= \d*\s(\3{0,2}n(?=.*\2)|\3{0,2})))+\s(n(?=\3))*+\K(..?1|(..+?)\6+1)

fist add some more digits at the end of your input string then change the n 拳头在输入字符串的末尾添加更多数字然后更改n

?=.*\2 to  n?=.*n\2 |  n?=.*1\2   n?=.*3\2 ..,  n?=.***n**\2

Finally change the \\3{0,2} to \\3{0, n }. 最后将\\ 3 {0,2}更改为\\ 3 {0， n }。 where n is the base. 其中n是基数。 Also remember that this will not work without the correct input string. 还要记住，没有正确的输入字符串，这将无法工作。

是否可以编写Perl / Java / etc正则表达式来匹配十进制（非）素数？

问题描述

1 个解决方案

解决方案1
4 已采纳 2016-11-07 14:25:13

是否可以编写Perl / Java / etc正则表达式来匹配十进制（非）素数？

问题描述

1 个解决方案

解决方案1 4 已采纳 2016-11-07 14:25:13

解决方案1
4 已采纳 2016-11-07 14:25:13