简体   繁体   English

使用regexp_substr进行操作

[英]Manipulating with regexp_substr

I have an ETL task for datawarehouse-ing purposes, I need to extract the second part of a String after a delimiter occurence such as: '#', 'ý', '-'. 我有一个ETL任务用于数据仓库,我需要在定界符出现后提取String的第二部分,例如:'#','ý','-'。 For example test case string: 例如测试用例字符串:

'Tori 1#MHK-MahallaKingaveKD' I should retrieve only 'MHK' 'Tori 1#MHK-MahallaKingaveKD'我应该只检索'MHK'

'HPHelm2ýFFS-Tredddline' I should retrieve only 'FFS' 'HPHelm2ýFFS-Tredddline'我应该只检索'FFS'

I already tried using the cases above: 我已经尝试使用上述情况:

TRIM(CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '#',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^#]+', 1,2), 
          '#'
       ))
    ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline', '-',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^-]+', 1,2), 
          '-'
       ))
       ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline','-') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','ý') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','#') = 0
    THEN 'HPHelm2ýFFS-Tredddline'
        ELSE (CASE
            WHEN INSTR('HPHelm2ýFFS-Tredddline','ý',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^ý]+', 1,2), 
          'ý'
       ))
            END)
          END)   
            END)
END)

Using the code above I can retrieve: 使用上面的代码,我可以检索:

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK-MahallaKingaveKD'
'HPHelm2ýFFS-Tredddline' ====> 'FFS-Tredddline'

Expected output: 预期产量:

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK'
'HPHelm2ýFFS-Tredddline' ====> 'FFS'

So I have to exclude '-' and the string after. 所以我必须排除'-'和之后的字符串。

I guess I should modify the regexp_substr pattern but can't seem to find a clear solution since '-' is specified in the case when statements as a delimiter. 我想我应该修改regexp_substr模式,但似乎找不到明确的解决方案,因为在将语句用作分隔符的情况下指定了“-”。

I suggest retrieving the second occurrence of 1+ chars other than your delimiter chars: 我建议检索除定界符以外的第二个1+个字符:

regexp_substr(col, '[^#ý-]+', 1, 2)

Here, the search starts with the first char in the record ( 1 ), and the second occurrence is returned ( 2 ). 在此,搜索从记录( 1 )中的第一个字符开始,并返回第二个字符( 2 )。

The [^#ý-]+ pattern matches one or more ( + ) chars other than # , ý and - . [^#ý-]+模式与#ý-以外的一个或多个( + )字符匹配。

The following will give you what you're looking for: 以下内容将为您提供所需的信息:

WITH cteData AS (SELECT 'Tori 1#MHK-MahallaKingaveKD' AS STRING FROM DUAL UNION ALL
                 SELECT 'HPHelm2ýFFS-Tredddline' FROM DUAL)
SELECT STRING, REGEXP_SUBSTR(STRING, '[#ý-](.*)[#ý-]', 1, 1, NULL, 1) AS SUB_STRING
  FROM cteData;

The parentheses around the .* between the delimiter groups makes the .* a sub-expression, and the final ,1 in the parameter list tells REGEXP_SUBSTR to give you back the value of sub-expression #1. 定界符组之间的.*括号使.*成为子表达式,而参数列表中的最后一个,1告诉REGEXP_SUBSTR返还子表达式#1的值。 Since there's only one sub-expression in the regular expression it gives you back the value of the .* , which is what you're looking for. 由于正则表达式中只有一个子表达式,因此它将为您返回.*的值,这就是您要查找的值。

sqlfiddle here sqlfiddle在这里

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM