简体   繁体   English

SQL正则表达式根据分隔符'/n'将列(字符串)拆分为多行

[英]SQL Regular expression to split a column (string) to multiple rows based on delimiter '/n'

I have to split a column to multiple rows.我必须将一列拆分为多行。 The column stores string, and we have to split delimiter based on '/n'该列存储字符串,我们必须根据'/n'分割分隔符

Have written the below query.已编写以下查询。 But not able to specify ^[/n] .但无法指定^[/n] The other 'n' in the string is also getting removed.字符串中的另一个'n'也被删除。 Please help to parse the string请帮助解析字符串

WITH sample AS 
( SELECT 101 AS id, 
        'Name' test, 
        '3243243242342342/n12131212312/n123131232/n' as attribute_1, 
        'test value/nneenu not/nhoney' as attribute_2
   FROM DUAL 
) 
-- end of sample data
SELECT id,
       test,
       regexp_substr(attribute_1,'[^/n]+', 1, column_value),
       regexp_substr(attribute_2,'[^/]+', 1, column_value)
  FROM sample,
       TABLE(
         CAST(
           MULTISET(SELECT LEVEL 
                       FROM dual  
                    CONNECT BY LEVEL <= LENGTH(attribute_1) - LENGTH(replace(attribute_1, '/n')) + 1
                   ) AS sys.OdciNumberList
         )
       )
 WHERE regexp_substr(attribute_1,'[^/n]+', 1, column_value) IS NOT NULL
/

you need to use class [[:cntrl:]] and '[^/n]+' is not syntactically good either.您需要使用类 [[:cntrl:]] 并且 '[^/n]+' 在语法上也不好。

the escape char is '\' and you cannot use [] to "wrap" special chars, you need to use () instead.(that is grouping)转义字符是 '\' 并且您不能使用 [] 来“包装”特殊字符,您需要使用 () 代替。(即分组)

if you want to ignore CR (eg'\n') , use [^[:cntrl:]] in the sec param in the regexp_substr如果您想忽略 CR (eg'\n') ,请在 regexp_substr 的 sec 参数中使用 [^[:cntrl:]]

more help: http://psoug.org/snippet/Regular-Expressions--Regexp-Cheat-Sheet_856.htm更多帮助:http: //psoug.org/snippet/Regular-Expressions--Regexp-Cheat-Sheet_856.htm

Assumption假设

/n is supposed to mean \n to match a newline ( strictly [Posix] speaking a LF character (hex x0a) ). /n应该意味着\n匹配换行符(严格 [Posix] 说 LF 字符 (hex x0a) )。

If this assumption is wrong, use (^|/n)(([^/]|/+[^n])+) as your regex and extract the part of interest using regexp_substr(attribute_1,'(^|/n)(([^/]|/+[^n])+)', 1, column_value, '', 2) .如果此假设错误,请使用(^|/n)(([^/]|/+[^n])+)作为您的正则表达式,并使用regexp_substr(attribute_1,'(^|/n)(([^/]|/+[^n])+)', 1, column_value, '', 2)

Solution解决方案

You cannot specify control characters in escape syntax within character classes.您不能在字符类中以转义语法指定控制字符。 Using the posix character class [:cntrl:] works but suffers from the other characters included;使用 posix 字符类[:cntrl:]可以工作,但会受到包含的其他字符的影响; for practical purposes, TAB ( #x09 ) might be a nuisance.出于实际目的,TAB (#x09) 可能会令人讨厌。

However, you can specify all characters in a regex character class composing the pattern string from literals and calls to the chr function:但是,您可以指定正则表达式字符类中的所有字符,这些字符由文字组成模式字符串并调用chr函数:

   -- ...
    '3243243242342342'||chr(13)||chr(10)||'12131212312'||chr(13)||chr(10)||'123131232'||chr(13)||chr(10) as attribute_1, 
    'test value'||chr(13)||chr(10)||'neenu not'||chr(13)||chr(10)||'honey' as attribute_2
   -- ...
   regexp_substr(attribute_1,'[^'||chr(13)||chr(10)||']+', 1, column_value),
   regexp_substr(attribute_2,'[^'||chr(13)||chr(10)||']+', 1, column_value)
   -- ...

You may want to check out the following test queries in sqlplus (the cr/lfs are part of the literals; copy into a text editor, check that the cr/lfs are preserved, re-insert if not, drop the result in sqlplus):您可能想在 sqlplus 中查看以下测试查询(cr/lfs 是文字的一部分;复制到文本编辑器中,检查 cr/lfs 是否保留,如果没有则重新插入,将结果放入 sqlplus) :

select regexp_substr('adda
yxcv','[^'||CHR(10)||CHR(13)||']+', 1, 2) from dual;
select regexp_substr('ad'||CHR(9)||'da
yxcv','[^[:cntrl:]]+', 1, 2) from dual;
with test as (select 'ABC' || chr(13) || 'DEF' || chr(13) || 'GHI' || chr(13) || 'JKL' || chr(13) || 'MNO' str from dual)
select regexp_substr (str, '[^' || chr(13) || ']+', 1, rownum) split
from test
connect by level <= length (regexp_replace (str, '[^' || chr(13) || ']+'))  + 1

First choice would be to fix the data model as data stored this way is not optimal.首选是修复数据模型,因为以这种方式存储的数据不是最优的。 At any rate, try this version with some more test data.无论如何,用更多的测试数据试试这个版本。 I tweaked the regex's:我调整了正则表达式:

WITH sample AS 
( SELECT 101 AS id, 
        'Name' test, 
        '3243243242342342/n12131212312/n123131232/n' as attribute_1, 
        'test value/nneenu not/nhoney' as attribute_2
   FROM DUAL 
) 
-- end of sample data
SELECT id,
       test,
       regexp_substr(attribute_1,'(.*?)(/n|$)', 1, column_value, NULL, 1),
       regexp_substr(attribute_2,'(.*?)(/n|$)', 1, column_value, NULL, 1)
  FROM sample,
       TABLE(
         CAST(
           MULTISET(SELECT LEVEL 
                       FROM dual  
                    --CONNECT BY LEVEL <= LENGTH(attribute_1) - LENGTH(replace(attribute_1, '/n')) + 1
                      -- Counts substrings ending with the delimiter.
                      CONNECT BY LEVEL <= REGEXP_COUNT(attribute_1, '.*?/n')                    
                   ) AS sys.OdciNumberList
         )
       )
 WHERE regexp_substr(attribute_1,'(.*?)(/n|$)', 1, column_value, NULL, 1) IS NOT NULL
/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM