简体   繁体   English

连接两个捕获组

[英]Concatenating two capture groups

I have a string that can be split into 3 parts (Keep1 | Ignore | Keep2).我有一个可以分成 3 部分的字符串(Keep1 | Ignore | Keep2)。 The objective is to remove the middle sub-string and concatenate the other two.目标是删除中间的子字符串并连接其他两个子字符串。 To achieve this I created two regular expressions, one to create a capture group for Keep1 and another for Keep2.为了实现这一点,我创建了两个正则表达式,一个为 Keep1 创建一个捕获组,另一个为 Keep2 创建一个捕获组。

Example text:示例文本:

First String.<ref> IGNORE </ref> Second String.

First regular expression:第一个正则表达式:

.*(?=<ref>)    

Output:输出:

First String.

Second regular expression:第二个正则表达式:

(?<=&lt;\/ref&gt;).*   

Output:输出:

Second String.   

Desired Output:期望输出:

First String. Second String.

I've so far been unable to figure out a way to concatenate both strings, is such a thing possible on flex?到目前为止,我一直无法找到连接两个字符串的方法,这在 flex 上可行吗?

(F)lex does not implement capture groups, and nor does it implement lookahead assertions. (F)lex 不实现捕获组,也不实现前瞻断言。 In general terms, it only implements constructs which meet the mathematical definition of "regular expression", abd can therefore be implemented with a simple finite state machine working in linear time and constant space.一般而言,它只实现满足“正则表达式”数学定义的构造,因此 abd 可以用在线性时间和恒定空间中工作的简单有限状态机来实现。

The (short and complete) documentation of its regular expression syntax is found in the Flex manual .其正则表达式语法的(简短而完整的)文档可在Flex 手册 中找到。

(The "f" in "flex" stands for "fast", but the original "lex" was also pretty snappy, basically because of this design decision.) (“flex”中的“f”代表“fast”,但最初的“lex”也很活泼,主要是因为这个设计决定。)

You have two choices, depending on the precise nature of your tokens:您有两种选择,具体取决于您的代币的确切性质:

  1. If you can definitely recognise the token from the first part, then you could use a start condition to recognise the rest of the token如果您确实可以从第一部分识别出令牌,那么您可以使用开始条件来识别令牌的其余部分

  2. Otherwise, you could recognise the entire token in one regular expression, and then rescan it to figure out which part you want to keep.否则,您可以在一个正则表达式中识别整个令牌,然后重新扫描它以找出您想要保留的部分。 You might or might not be able to do the second scan with flex;您可能会也可能不会使用 flex 进行第二次扫描; again, you could use a start condition to apply different rules for the rescan but it will depend on the precise nature of your pattern.同样,您可以使用开始条件为重新扫描应用不同的规则,但这将取决于您的模式的确切性质。 You could also rescan with a regular expression library, either the Posix standard library or some more flexible library such as PCRE.您还可以使用正则表达式库(Posix 标准库或一些更灵活的库(例如 PCRE))重新扫描。

Note that (f)lex also does not implement non-greedy repetition, so if you want to implement "the shortest string starting with X and ending with Y", you need to use a technique like the one shown in the (last) example in the Flex manual chapter on start conditions请注意,(f)lex 也没有实现非贪婪重复,因此如果要实现“以 X 开头并以 Y 结尾的最短字符串”,则需要使用类似于(最后一个)示例中所示的技术在有关启动条件的 Flex 手册章节中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM