简体   繁体   中英

Is there any method of generating arbitrary equivalent regular expressions?

I want to write tests for a regular expression analysis engine. It would be nice if I could generate arbitrary pairs of equivalent regular expressions, to see whether the engine correctly parses them and identifies them as being equivalent. Is there any known algorithm for doing so?

I would also accept a list of 20-100 well-known regex equivalences, if anyone knows of a pre-created list. For example a*a and aa* or (ab)*a and a(ba)* .

The method I came up with was as follows - I assembled a list of simple regex transformations which preserved equivalence, for example (assuming a and b are equivalent):

  • f(a, b) ⩴ (a*a, bb*)
  • f(a, b) ⩴ (aa?, b?b)
  • f(a, b) ⩴ (ab, ba)
  • f(a, b) ⩴ (a[\\d]+, b[0-9]+)

etc. Then I randomly & iteratively applied these transformations to a known-equal pair of starting regexes, for example (x, x) . The end result is a pair of complicated but equivalent regexes. This generation algorithm is suitable for use in property-based testing.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM