简体   繁体   English

Teradata SQL 正则表达式 - 处理连续的分隔符

[英]Teradata SQL Regular expression - dealing with consecutive delimiter

I am trying to use regexp_substr to break up table data held in one cell to the individual fields.我正在尝试使用 regexp_substr 将一个单元格中保存的表数据分解为各个字段。

the data is, delimited.数据是分隔的。 individual cells can also contain, within quotes and finally some cells can be unpopulated单个单元格也可以包含在引号内,最后一些单元格可以不填充

My sample logic is working for the first 2 requirements but i can't sort the third, please help!我的示例逻辑适用于前两个要求,但我无法对第三个要求进行排序,请帮忙!

the issue is b4 should be null but it is being returned as F.问题是 b4 应该为 null 但它作为 F 返回。

SEL
'a, b, c,, F,"d, e, f", g, h' AS f1,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,1) AS b1,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,2) AS b2,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,3) AS b3,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,4) AS b4,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,5) AS b5,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,6) AS b6,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,7) AS b7,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,8) AS b8,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,9) AS b9,
RegExp_Substr(f1,'(".*?"|[^",\s]+)(?=,|$)',1,10) AS b10
;

Thanks谢谢

JF杰夫

Your input looks like CSV-data.您的输入看起来像 CSV 数据。 If the number of columns is constant you can utilize the CSVLD table function:如果列数不变,您可以使用 CSVLD 表函数:

WITH cte AS 
 (  -- base select here
   SELECT 'a, b, c,, F,"d, e, f", g, h'  as f1
   --FROM mytable
 )
SELECT *
FROM TABLE
 (
   CSVLD
    (cte.f1  -- input column
    ,','     -- delimiter character
    ,'"'     -- quote character
    )
   RETURNS
    (
      b1  VarChar(11) CHARACTER SET Unicode
     ,b2  VarChar(11) CHARACTER SET Unicode
     ,b3  VarChar(11) CHARACTER SET Unicode
     ,b4  VarChar(11) CHARACTER SET Unicode
     ,b5  VarChar(11) CHARACTER SET Unicode
     ,b6  VarChar(11) CHARACTER SET Unicode
     ,b7  VarChar(11) CHARACTER SET Unicode
     ,b8  VarChar(11) CHARACTER SET Unicode
    )
 ) AS t
;

If your input column is LATIN remove the CHARACTER SET Unicode frm the output columns.如果您的输入列是 LATIN,请从输出列中删除CHARACTER SET Unicode

This regex works for your sample case:此正则表达式适用于您的示例案例:

(?:,|^)?(".*?"|[^,]*)

You'll need to use the second group of the match instead of the first.您需要使用匹配的第二组而不是第一组。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM