反向正则表达式机器实现

Question

I'm trying to match a string starting from the last character to fail as soon as possible.我试图尽快匹配从最后一个字符开始的字符串以失败。 This way I can fail a match with a custom string cstr (see specification below) with least amount of operations (4th property).这样我就可以用最少的操作（第 4 个属性）使与自定义字符串cstr （参见下面的规范）的匹配失败。

From a theoritical perspective the regex can be represented as a finite state mashine and the arrows can be flipped, creating the reversed regex.从理论的角度来看，正则表达式可以表示为有限的 state mashine，箭头可以翻转，从而创建反向正则表达式。

I'm looking for an implementation of this.我正在寻找这个的实现。 A library/program which I can give the string and the pattern.我可以提供字符串和模式的库/程序。 cstr is implemented in python, so if possible a python module. cstr在 python 中实现，因此如果可能的话，一个 python 模块。 (For the curious i-th character is not calculated until needed.) For anything other I need to do much more work because of cstr 's calculation is hard to port to another language. （因为奇怪的第 i 个字符直到需要时才计算。）对于任何其他我需要做更多的工作，因为cstr的计算很难移植到另一种语言。

The implementation doesn't have to cover all latex syntax.实现不必涵盖所有 latex 语法。 I'm looking for the basics.我正在寻找基础知识。 No lookaheads or fancy stuff.没有前瞻性或花哨的东西。 See specification below.请参阅下面的规格。

I may be lacking common knowledge.我可能缺乏常识。 Please do comment obvious things, too.也请评论明显的事情。

Specification规格

The custom string cstr has the following properties:自定义字符串cstr具有以下属性：

String can be calculated in finite time.字符串可以在有限的时间内计算出来。
String has finite length字符串长度有限
The last character is known最后一个字符是已知的
Every previous character requires a costly calculation每个前面的字符都需要昂贵的计算
Until the string is calculated fully, length is unknown在字符串完全计算之前，长度是未知的

When the string is calcualted fully, I want to match it with a simple regex which may contain these from the syntax.当字符串被完全计算后，我想用一个简单的正则表达式来匹配它，它可能包含语法中的这些。 No look aheads or fancy stuff.没有前瞻性或花哨的东西。

alphanumeric characters字母数字字符
uinicode characters uinicode字符
. , * , + , ? , * , + , ? , \w , \W , [] , | , \w , \W , [] , | , escape char \ , range specifitation with {, } , 转义字符\ , 用{, }指定范围

PS: This is not a homework question. PS：这不是作业题。 I'm trying to formulate my question as clear as possible.我试图尽可能清楚地表达我的问题。

Answer 1

OP here. OP在这里。 Here are some thougts:以下是一些想法：

Since I'm looking for an unoptimized regex mashine, I have to build it myself, which takes time.由于我正在寻找未优化的正则表达式机器，因此我必须自己构建它，这需要时间。
Alternatively we can define an upperbound for cstr length and create all strings that matches given regex with length < upperbound.或者，我们可以为cstr长度定义上限，并创建与给定正则表达式匹配且长度 < 上限的所有字符串。 Then we put all solutions to a tire data structure and match it.然后我们将所有的解决方案放到一个轮胎数据结构中并匹配它。 This depends on the use case and maybe a cache can be involved.这取决于用例，并且可能涉及缓存。
What I'm going for is python module greenery我要的是python模块greenery

from greenery import parse
pattern = parse.Pattern(...)
pattern.reversed()
...

this sometimes provieds a good matching experience.这有时会提供很好的匹配体验。 Sometimes not but it is ok for me.有时不是，但对我来说没关系。

反向正则表达式机器实现

问题描述

Specification规格

1 个解决方案

解决方案1
0 2022-12-03 12:57:48

反向正则表达式机器实现

问题描述

Specification规格

1 个解决方案

解决方案1 0 2022-12-03 12:57:48

解决方案1
0 2022-12-03 12:57:48