[英]Manipulating with regexp_substr
I have an ETL task for datawarehouse-ing purposes, I need to extract the second part of a String after a delimiter occurence such as: '#', 'ý', '-'. 我有一个ETL任务用于数据仓库,我需要在定界符出现后提取String的第二部分,例如:'#','ý','-'。 For example test case string: 例如测试用例字符串:
'Tori 1#MHK-MahallaKingaveKD' I should retrieve only 'MHK' 'Tori 1#MHK-MahallaKingaveKD'我应该只检索'MHK'
'HPHelm2ýFFS-Tredddline' I should retrieve only 'FFS' 'HPHelm2ýFFS-Tredddline'我应该只检索'FFS'
I already tried using the cases above: 我已经尝试使用上述情况:
TRIM(CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '#',1,1) > 0
THEN (REPLACE(
REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^#]+', 1,2),
'#'
))
ELSE (CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '-',1,1) > 0
THEN (REPLACE(
REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^-]+', 1,2),
'-'
))
ELSE (CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline','-') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','ý') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','#') = 0
THEN 'HPHelm2ýFFS-Tredddline'
ELSE (CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline','ý',1,1) > 0
THEN (REPLACE(
REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^ý]+', 1,2),
'ý'
))
END)
END)
END)
END)
Using the code above I can retrieve: 使用上面的代码,我可以检索:
'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK-MahallaKingaveKD'
'HPHelm2ýFFS-Tredddline' ====> 'FFS-Tredddline'
Expected output: 预期产量:
'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK'
'HPHelm2ýFFS-Tredddline' ====> 'FFS'
So I have to exclude '-' and the string after. 所以我必须排除'-'和之后的字符串。
I guess I should modify the regexp_substr pattern but can't seem to find a clear solution since '-' is specified in the case when statements as a delimiter. 我想我应该修改regexp_substr模式,但似乎找不到明确的解决方案,因为在将语句用作分隔符的情况下指定了“-”。
I suggest retrieving the second occurrence of 1+ chars other than your delimiter chars: 我建议检索除定界符以外的第二个1+个字符:
regexp_substr(col, '[^#ý-]+', 1, 2)
Here, the search starts with the first char in the record ( 1
), and the second occurrence is returned ( 2
). 在此,搜索从记录( 1
)中的第一个字符开始,并返回第二个字符( 2
)。
The [^#ý-]+
pattern matches one or more ( +
) chars other than #
, ý
and -
. [^#ý-]+
模式与#
, ý
和-
以外的一个或多个( +
)字符匹配。
The following will give you what you're looking for: 以下内容将为您提供所需的信息:
WITH cteData AS (SELECT 'Tori 1#MHK-MahallaKingaveKD' AS STRING FROM DUAL UNION ALL
SELECT 'HPHelm2ýFFS-Tredddline' FROM DUAL)
SELECT STRING, REGEXP_SUBSTR(STRING, '[#ý-](.*)[#ý-]', 1, 1, NULL, 1) AS SUB_STRING
FROM cteData;
The parentheses around the .*
between the delimiter groups makes the .*
a sub-expression, and the final ,1
in the parameter list tells REGEXP_SUBSTR
to give you back the value of sub-expression #1. 定界符组之间的.*
括号使.*
成为子表达式,而参数列表中的最后一个,1
告诉REGEXP_SUBSTR
返还子表达式#1的值。 Since there's only one sub-expression in the regular expression it gives you back the value of the .*
, which is what you're looking for. 由于正则表达式中只有一个子表达式,因此它将为您返回.*
的值,这就是您要查找的值。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.