简体   繁体   中英

Regex for boolean logic

I am trying to use a regex to validate a string. It should allow white spaces between a string and a booleaen operator like (@string1 OR) , but disallow white spaces in between strings like (string 1) . Other boolean logics allowed are:

(A AND B) AND (NOT C)
(A OR B) AND (NOT C)
(A AND B)
(A OR B)
(NOT C)

Examples of possible valid and invalid inputs are below.

Valid:

(@string1 OR @string2) AND ( NOT @string3)
(@string-1 AND @string.2) AND ( NOT @string_3)
(@string1 OR @string2 OR @string4) AND ( NOT @string3 AND NOT @string5)
(@string1    OR   @string2   OR    @string4)
(@string1 AND @string2 AND @string4)
( NOT @string1 AND NOT @string2 AND NOT @string4)
( NOT @string1 AND NOT @string2)

Invalid:

()
(string  1 OR @str ing2) AND ( NOT @tag3)
(@string 1 OR @tag 2) AND ( NOT @string 3)
(@string1  @string2) ( NOT @string3)
(@string1 OR @string12) AND (@string3)
(@string1 AND NOT @string2)

Is it better to parse the string and then have multiple regexes check for the absence of whitespaces, or can a regex be written to check the entire string?

This kind of sophisticated validation would be best solved with a grammar parser.

Just to get you started, here is an (incomplete) solution in parslet. As you can see, you build up from primitives and construct more and more complicated structures.

require 'parslet'

class Boolean < Parslet::Parser
  rule(:space)  { match[" "].repeat(1) }
  rule(:space?) { space.maybe }

  rule(:lparen) { str("(") >> space? }
  rule(:rparen) { str(")") >> space? }

  rule(:and_operator) { str("AND") >> space? }
  rule(:or_operator) { str("OR") >> space? }
  rule(:not_operator) { str("NOT") >> space? }

  rule(:token) { str("@") >> match["a-z0-9"].repeat >> space? }

  # The primary rule deals with parentheses.
  rule(:primary) { lparen >> expression >> rparen | token }

  rule(:and_expression) { primary >> and_operator >> primary }
  rule(:or_expression) { primary >> or_operator >> primary }
  rule(:not_expression) { not_operator >> primary }

  rule(:expression) { or_expression | and_expression | not_expression | primary }

  root(:expression)
end

You can test a string with this little helper method:

def parse(str)
  exp = Boolean.new
  exp.parse(str)
  puts "Valid!"
rescue Parslet::ParseFailed => failure
  puts failure.parse_failure_cause.ascii_tree
end

parse("@string AND (@string2 OR @string3)")
#=> Valid!
parse("(string1 AND @string2)")
#=> Expected one of [OR_EXPRESSION, AND_EXPRESSION, NOT_EXPRESSION, PRIMARY] at line 1 char 1.
#   ...
#   - Failed to match sequence ('@' [a-z0-9]{0, } SPACE?) at line 1 char 2.
#      - Expected "@", but got "s" at line 1 char 2.

您需要递归或循环,并且要正确解析堆栈和单独使用正则表达式将非常困难,尽管无法进行验证。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM