简体   繁体   English

正则表达式Oracle中的子表达式

[英]Subexpression in regular expression Oracle

Why do we need a subexpression in regular expressions in Oracle? 为什么在Oracle中需要在正则表达式中有一个子表达式?

It is new feature of Oracle 11g, we can specify which subexpression from pattern we want to find. 它是Oracle 11g的新功能,我们可以指定要从模式中查找的子表达式。 We can use this parameter in REGEXP_SUBSTR and REGEXP_INSTR . 我们可以在REGEXP_SUBSTRREGEXP_INSTR使用此参数。

Here is an example from docs: 这是docs中的示例:

SELECT REGEXP_INSTR('1234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2) 
"REGEXP_INSTR" FROM DUAL;

REGEXP_INSTR
-------------------
4

But it isn't clear to me, why do we really need this parameter (in the above example we could just use 4(56)(78) pattern). 但是我不清楚,为什么我们真的需要此参数(在上面的示例中,我们只能使用4(56)(78)模式)。 Does anyone have real-world example? 有人有真实的例子吗?

Oracle regex does not support lookaround . Oracle regex不支持环视 Just like ^ and $ anchors can be used to specify start and end of the string, lookarounds(lookbehind/lookahead) can be used to match(or not) any pattern either preceding or succeeding the pattern that you are interested in. 就像^$锚可以用来指定字符串的开始和结束一样,lookarounds(lookbehind / lookahead)可以用来匹配(或不匹配)您感兴趣的模式之前或之后的任何模式。

For example consider the following values in a column and you need to select pickup date only. 例如,考虑列中的以下值,而您只需要选择取货日期。

event_dte
----------------------
pickup_dte 2015-04-03
shipped_dte 2015-03-02
PU_dte 2015-03-11
pickup_date 2014-05-02
delivery_dte 2015-07-11

There are all possible dates in this and the wording of pickup date is also not consistent. 其中有所有可能的日期,取件日期的措词也不一致。

You can write a regex like (pickup|PU)_d(a?)te (\\d{4}-]d{2}-\\d{2}) . 您可以编写正则表达式,例如(pickup|PU)_d(a?)te (\\d{4}-]d{2}-\\d{2}) This will match the entire string and when used in regexp_substr will return entire string. 这将匹配整个字符串,并且在regexp_substr中使用时将返回整个字符串。 If you use subexpressions you can extract only the date part. 如果使用子表达式,则只能提取日期部分。 For the above example, it will be third sub expression. 对于上面的示例,它将是第三个子表达式。

Well, I figured that out, so if someone interested in that is my answer: 好吧,我知道了,所以如果对此感兴趣的人就是我的答案:

We use subexpression when we want to find a specific string, which follows some string, which also could follows some string etc. (or otherwise, find a string, which followed by some strings) 当我们想找到一个特定的字符串时,可以使用子表达式,它可以跟随某个字符串,也可以跟随某个字符串,等等(或者,找到一个字符串,之后跟随一些字符串)

So for upper example I modify source string: 因此,对于上例,我修改了源字符串:

SELECT REGEXP_INSTR('456781234567890', '(123)(4(56)(78))', 1, 1, 0, 'i', 2) 
"REGEXP_INSTR" FROM DUAL;

REGEXP_INSTR
-------------------
9

Here we will find position of string 45678 only if it follows 123 . 在这里,只有在字符串45678跟随123我们才能找到它的位置。 And we get 9 instead of 1 我们得到9而不是1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM