正则表达式和检查出现

Question

I am actually dealing with regular expressions and i am still trying to understand how to approach properly this kind of problems.我实际上正在处理正则表达式，并且我仍在尝试了解如何正确处理此类问题。

So lets say i have this regular expression:所以可以说我有这个正则表达式：

[A − Z]
∗01∗
[ˆ[A − Z]]{3}

On alphabet [AZ][0-9]在字母表 [AZ][0-9]

First question is: {3} means that there must be atleast 3 characters that belong to a "part" of the regular expression(lets say 3[A − Z]) or it is strictly refering to the last one ([ˆ[A − Z]])?第一个问题是：{3} 意味着必须有至少 3 个字符属于正则表达式的“部分”（比如说 3[A - Z]），或者它严格指代最后一个字符（[^[A − Z]])？

My second doubt is: if it is the last one, checking if there are atleast 3 occurrences might be easy(just 3 states that check if the char is a number, otherwise exit),right?我的第二个疑问是：如果它是最后一个，检查是否有至少 3 次出现可能很容易（只有 3 个状态检查 char 是否为数字，否则退出），对吗？ Otherwise, if it might be any of the possible part of the regular expression, how do i check without a counter(eventually confirm if i shouldnt be using a counter) how many occurrences repeat in any possible state?否则，如果它可能是正则表达式的任何可能部分，我如何在没有计数器的情况下检查（最终确认我是否不应该使用计数器）在任何可能的 state 中重复出现多少次？

I am not really interested in a solution with code, i just want to fully understand the topic.我对带有代码的解决方案并不真正感兴趣，我只想完全理解这个话题。

Answer 1

Regular expressions are a formal mathematical construction, but syntaxes for describing them may vary.正则表达式是一种正式的数学结构，但用于描述它们的语法可能会有所不同。 In common syntaxes, {3} means the previous item is repeated three times.在常用语法中， {3}表示前一项重复了 3 次。 For example, [AB]{3} is the same as [AB][AB][AB] , so it will match AAA , AAB , ABA , ABA , BAA , BAB , BBA , or BBB .例如， [AB]{3}与[AB][AB][AB]相同，因此它将匹配AAA 、 AAB 、 ABA 、 ABA 、 BAA 、 BAB 、 BBA或BBB 。 Or (AA|B){2} will match AAAA , AAB , BAA , or BB .或者(AA|B){2}将匹配AAAA 、 AAB 、 BAA或BB 。 It does not require there be two characters.它不需要有两个字符。 It requires there be two matches of (AA|B) .它需要有两个匹配(AA|B) 。

What the “previous item” is may depend on the particular syntax you are using. “上一个项目”是什么可能取决于您使用的特定语法。 For example, in AA|B{2} , either |例如，在AA|B{2}中， | or {…} could be given a higher precedence, so it could be AA|(B{2}) or (AA|B){2} , depending on the rules in your syntax.或{…}可以被赋予更高的优先级，因此它可以是AA|(B{2})或(AA|B){2} ，具体取决于您的语法规则。 However, in the specific example you asked about, the brackets clearly form a unit, so [ˆ[A − Z]]{3} requires three matches to [ˆ[A − Z]] .但是，在您询问的具体示例中，括号显然形成了一个单位，因此[ˆ[A − Z]]{3}需要与[ˆ[A − Z]] ^[A - Z]] 三个匹配项。 Again assuming a common syntax, [ˆ[A − Z]] means one character that does not match [AZ] , so a character that is not A through Z .再次假设一个通用语法， [ˆ[A − Z]]表示一个不匹配[AZ]的字符，因此不是A到Z的字符。 Since your alphabet consists only of A through Z and 0 through 9 , [^[AZ]] matches 0 through 9 .由于您的字母表仅包含A到Z和0到9 ，因此[^[AZ]]匹配0到9 。

Thus [^[AZ]]{3} matches a three-digit numeral and nothing else.因此[^[AZ]]{3}匹配三位数字，仅此而已。

Answer 2

First, there's a bunch of problems with your regex.首先，您的正则表达式存在很多问题。

I believe your "smart" editor has mangled the regex.我相信您的“智能”编辑器已经破坏了正则表达式。 It's replaced ^ (U+0005E CIRCUMFLEX ACCENT) and - (U+0002D - HYPHEN-MINUS) with the fancy versions: ^ (U+002C6 - MODIFIER LETTER CIRCUMFLEX ACCENT) and (U+02212 - MINUS SIGN).它已替换 ^ (U+0005E CIRCUMFLEX ACCENT) 和 - (U+0002D - HYPHEN-MINUS) 为花哨的版本：^ (U+002C6 - MODIFIER LETTER CIRCUMFLEX ACCENT) 和 (U+02212 - MINUS SIGN)。 They look the same, but they are different characters and have different meanings in a regex.它们看起来相同，但它们是不同的字符，并且在正则表达式中具有不同的含义。 To avoid this, be sure to use a good code editor such as Atom .为避免这种情况，请务必使用良好的代码编辑器，例如Atom 。

Spaces are also important.空间也很重要。 [A - Z] means something different than [AZ] . [A - Z]含义与[AZ]不同。 So are newlines, they are treated literally.换行符也是如此，它们按字面意思对待。

∗01∗ does not mean to match 01 surrounded by anything. ∗01∗并不意味着匹配被任何东西包围的01 。 Regexes don't work like file globs.正则表达式不像文件 glob 那样工作。 While * does mean "zero or more" like a file glob, it is "zero or more of the immediately preceding thing".虽然 * 确实像文件 glob 一样表示“零个或多个”，但它是“前一个事物的零个或多个”。 . matches (almost) anything.匹配（几乎）任何东西。 So it would be .*01.* .所以它会是.*01.* 。

[ˆ[A − Z]]{3} should be [^AZ]{3} . [ˆ[A − Z]]{3}应该是[^AZ]{3} 。 [^...] means to match what is not in the set. [^...]表示匹配不在集合中的内容。 [^AZ]{3} means to match exactly 3 of anything which are not between A and Z. 123 or abc or !@# . [^AZ]{3}表示精确匹配任何不在A 和 Z 之间的 3 个。 123或abc或!@# 。

Putting it all together: [AZ].*01.*[^AZ]{3} says to match exactly one character in the set between A and Z, then match anything, then exactly 01 , then anything, then exactly 3 characters which are not in the set between A and Z. C01;;;将它们放在一起： [AZ].*01.*[^AZ]{3}表示要匹配 A 和 Z 之间的集合中的一个字符，然后匹配任何内容，然后是01 ，然后是任何内容，然后是 3 个字符不在A和Z之间的集合中C01;;; and blah blah Z blah 01 blah blah abc both match.和blah blah Z blah 01 blah blah abc都匹配。

Regex 101 is a valuable resource for understanding regexes. Regex 101是理解正则表达式的宝贵资源。 Regular-Expressions.info is a very good tutorial site. Regular-Expressions.info是一个非常好的教程网站。

First question is: {3} means that there must be atleast 3 characters that belong to a "part" of the regular expression(lets say 3[A − Z]) or it is strictly refering to the last one ([ˆ[A − Z]])?第一个问题是：{3} 意味着必须有至少 3 个字符属于正则表达式的“部分”（比如说 3[A - Z]），或者它严格指代最后一个字符（[^[A − Z]])？

{3} is a "quantifier" . {3}是一个“量词” 。 So are + (one or more), * (zero or more), and ? + （一个或多个）、 * （零个或多个）和? (zero or one). （零或一）。 All quantifiers match the thing immediately preceding it.所有量词都匹配紧接在它前面的事物。 A{3} means "AAA". A{3}表示“AAA”。 [AZ]{3} means exactly three characters in the set of A through Z. [AZ]{3}表示从 A 到 Z 的集合中的三个字符。

My second doubt is: if it is the last one, checking if there are atleast 3 occurrences might be easy(just 3 states that check if the char is a number, otherwise exit),right?我的第二个疑问是：如果它是最后一个，检查是否有至少 3 次出现可能很容易（只有 3 个状态检查 char 是否为数字，否则退出），对吗？ Otherwise, if it might be any of the possible part of the regular expression, how do i check without a counter(eventually confirm if i shouldnt be using a counter) how many occurrences repeat in any possible state?否则，如果它可能是正则表达式的任何可能部分，我如何在没有计数器的情况下检查（最终确认我是否不应该使用计数器）在任何可能的 state 中重复出现多少次？

Regular expressions are insanely complicated.正则表达式非常复杂。 They are a language unto themselves.它们本身就是一种语言。 Unless this is for a class, use a regular expression library such as PCRE .除非这是针对 class 的，否则请使用诸如PCRE之类的正则表达式库。

正则表达式和检查出现

问题描述

2 个解决方案

解决方案1
0 2020-06-26 00:55:52

解决方案2
0 2020-06-26 01:04:20

正则表达式和检查出现

问题描述

2 个解决方案

解决方案1 0 2020-06-26 00:55:52

解决方案2 0 2020-06-26 01:04:20

解决方案1
0 2020-06-26 00:55:52

解决方案2
0 2020-06-26 01:04:20