使用regexp_substr进行操作

Question

I have an ETL task for datawarehouse-ing purposes, I need to extract the second part of a String after a delimiter occurence such as: '#', 'ý', '-'. 我有一个ETL任务用于数据仓库，我需要在定界符出现后提取String的第二部分，例如：'＃'，'ý'，'-'。 For example test case string: 例如测试用例字符串：

'Tori 1#MHK-MahallaKingaveKD' I should retrieve only 'MHK' 'Tori 1＃MHK-MahallaKingaveKD'我应该只检索'MHK'

'HPHelm2ýFFS-Tredddline' I should retrieve only 'FFS' 'HPHelm2ýFFS-Tredddline'我应该只检索'FFS'

I already tried using the cases above: 我已经尝试使用上述情况：

TRIM(CASE
WHEN INSTR('HPHelm2ýFFS-Tredddline', '#',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^#]+', 1,2), 
          '#'
       ))
    ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline', '-',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^-]+', 1,2), 
          '-'
       ))
       ELSE (CASE 
            WHEN INSTR('HPHelm2ýFFS-Tredddline','-') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','ý') = 0 AND INSTR('HPHelm2ýFFS-Tredddline','#') = 0
    THEN 'HPHelm2ýFFS-Tredddline'
        ELSE (CASE
            WHEN INSTR('HPHelm2ýFFS-Tredddline','ý',1,1) > 0
    THEN (REPLACE(
          REGEXP_SUBSTR('HPHelm2ýFFS-Tredddline', '[^ý]+', 1,2), 
          'ý'
       ))
            END)
          END)   
            END)
END)

Using the code above I can retrieve: 使用上面的代码，我可以检索：

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK-MahallaKingaveKD'
'HPHelm2ýFFS-Tredddline' ====> 'FFS-Tredddline'

Expected output: 预期产量：

'Tori 1#MHK-MahallaKingaveKD' ====> 'MHK'
'HPHelm2ýFFS-Tredddline' ====> 'FFS'

So I have to exclude '-' and the string after. 所以我必须排除'-'和之后的字符串。

I guess I should modify the regexp_substr pattern but can't seem to find a clear solution since '-' is specified in the case when statements as a delimiter. 我想我应该修改regexp_substr模式，但似乎找不到明确的解决方案，因为在将语句用作分隔符的情况下指定了“-”。

Answer 1

I suggest retrieving the second occurrence of 1+ chars other than your delimiter chars: 我建议检索除定界符以外的第二个1+个字符：

regexp_substr(col, '[^#ý-]+', 1, 2)

Here, the search starts with the first char in the record ( 1 ), and the second occurrence is returned ( 2 ). 在此，搜索从记录（ 1 ）中的第一个字符开始，并返回第二个字符（ 2 ）。

The [^#ý-]+ pattern matches one or more ( + ) chars other than # , ý and - . [^#ý-]+模式与# ， ý和-以外的一个或多个（ + ）字符匹配。

Answer 2

The following will give you what you're looking for: 以下内容将为您提供所需的信息：

WITH cteData AS (SELECT 'Tori 1#MHK-MahallaKingaveKD' AS STRING FROM DUAL UNION ALL
                 SELECT 'HPHelm2ýFFS-Tredddline' FROM DUAL)
SELECT STRING, REGEXP_SUBSTR(STRING, '[#ý-](.*)[#ý-]', 1, 1, NULL, 1) AS SUB_STRING
  FROM cteData;

The parentheses around the .* between the delimiter groups makes the .* a sub-expression, and the final ,1 in the parameter list tells REGEXP_SUBSTR to give you back the value of sub-expression #1. 定界符组之间的.*括号使.*成为子表达式，而参数列表中的最后一个,1告诉REGEXP_SUBSTR返还子表达式＃1的值。 Since there's only one sub-expression in the regular expression it gives you back the value of the .* , which is what you're looking for. 由于正则表达式中只有一个子表达式，因此它将为您返回.*的值，这就是您要查找的值。

sqlfiddle here sqlfiddle在这里

使用regexp_substr进行操作

问题描述

2 个解决方案

解决方案1
1 已采纳 2019-08-07 11:30:03

解决方案2
1 2019-08-07 11:31:18

使用regexp_substr进行操作

问题描述

2 个解决方案

解决方案1 1 已采纳 2019-08-07 11:30:03

解决方案2 1 2019-08-07 11:31:18

解决方案1
1 已采纳 2019-08-07 11:30:03

解决方案2
1 2019-08-07 11:31:18