简体   繁体   English

反向正则表达式机器实现

[英]reversed regex mashine implementation

I'm trying to match a string starting from the last character to fail as soon as possible.我试图尽快匹配从最后一个字符开始的字符串以失败。 This way I can fail a match with a custom string cstr (see specification below) with least amount of operations (4th property).这样我就可以用最少的操作(第 4 个属性)使与自定义字符串cstr (参见下面的规范)的匹配失败。

From a theoritical perspective the regex can be represented as a finite state mashine and the arrows can be flipped, creating the reversed regex.从理论的角度来看,正则表达式可以表示为有限的 state mashine,箭头可以翻转,从而创建反向正则表达式。

I'm looking for an implementation of this.我正在寻找这个的实现 A library/program which I can give the string and the pattern.我可以提供字符串和模式的库/程序。 cstr is implemented in python, so if possible a python module. cstr在 python 中实现,因此如果可能的话,一个 python 模块。 (For the curious i-th character is not calculated until needed.) For anything other I need to do much more work because of cstr 's calculation is hard to port to another language. (因为奇怪的第 i 个字符直到需要时才计算。)对于任何其他我需要做更多的工作,因为cstr的计算很难移植到另一种语言。

The implementation doesn't have to cover all latex syntax.实现不必涵盖所有 latex 语法。 I'm looking for the basics.我正在寻找基础知识。 No lookaheads or fancy stuff.没有前瞻性或花哨的东西。 See specification below.请参阅下面的规格。

I may be lacking common knowledge.我可能缺乏常识。 Please do comment obvious things, too.也请评论明显的事情。


Specification规格

The custom string cstr has the following properties:自定义字符串cstr具有以下属性:

  1. String can be calculated in finite time.字符串可以在有限的时间内计算出来。
  2. String has finite length字符串长度有限
  3. The last character is known最后一个字符是已知的
  4. Every previous character requires a costly calculation每个前面的字符都需要昂贵的计算
  5. Until the string is calculated fully, length is unknown在字符串完全计算之前,长度是未知的

When the string is calcualted fully, I want to match it with a simple regex which may contain these from the syntax.当字符串被完全计算后,我想用一个简单的正则表达式来匹配它,它可能包含语法中的这些。 No look aheads or fancy stuff.没有前瞻性或花哨的东西。

  • alphanumeric characters字母数字字符
  • uinicode characters uinicode字符
  • . , * , + , ? , * , + , ? , \w , \W , [] , | , \w , \W , [] , | , escape char \ , range specifitation with {, } , 转义字符\ , 用{, }指定范围

PS: This is not a homework question. PS:这不是作业题。 I'm trying to formulate my question as clear as possible.我试图尽可能清楚地表达我的问题。

OP here. OP在这里。 Here are some thougts:以下是一些想法:

  • Since I'm looking for an unoptimized regex mashine, I have to build it myself, which takes time.由于我正在寻找未优化的正则表达式机器,因此我必须自己构建它,这需要时间。

  • Alternatively we can define an upperbound for cstr length and create all strings that matches given regex with length < upperbound.或者,我们可以为cstr长度定义上限,并创建与给定正则表达式匹配且长度 < 上限的所有字符串。 Then we put all solutions to a tire data structure and match it.然后我们将所有的解决方案放到一个轮胎数据结构中并匹配它。 This depends on the use case and maybe a cache can be involved.这取决于用例,并且可能涉及缓存。

  • What I'm going for is python module greenery我要的是python模块greenery

from greenery import parse
pattern = parse.Pattern(...)
pattern.reversed()
...

this sometimes provieds a good matching experience.这有时会提供很好的匹配体验。 Sometimes not but it is ok for me.有时不是,但对我来说没关系。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM