简体   繁体   English

使用 SSIS 或 T-SQL 将一列带引号和不带引号的逗号分隔值拆分为多列

[英]Using SSIS OR T-SQL Split a column of quoted & unquoted comma separated values into multiple columns

I have comma separated data in a column named C0.我在名为 C0 的列中有逗号分隔的数据。
The data in C0 looks like this: C0 中的数据如下所示:

C0 C0
"Pacey LLC.",213830ZZ,11/1/2017,11/1/2017,"297,311.74","2,371.40",0.00,"1,325.18",0.00,42.22,"123,986.56" “佩西有限责任公司”,213830ZZ,11/1/2017,11/1/2017,”297,311.74”,”2,371.40”,0.00,”1,325.18”,0.00,42.22,”123,986.56”
Mike The Miker,9814140VCD,12/1/2018,12/1/2018,"3,917,751.99","419,743.54","36,642.66","344,090.43",0.00,10.00,"2,434,671.06" Mike The Miker,9814140VCD,12/1/2018,12/1/2018,"3,917,751.99","419,743.54","36,642.66","344,090.43",0.00,10.00,"2,434,671.06"

And I want it to end up like this:我希望它最终是这样的:

F1 F1 F1 F1 F3 F3 F4 F4 F5 F5 F6 F6 F7 F7 F8 F8 F9 F9 F10 F10 F11 F11
"Pacey LLC." “佩西有限责任公司。” 213830ZZ 213830ZZ 11/1/2017 2017 年 11 月 1 日 11/1/2017 2017 年 11 月 1 日 297,311.74 297,311.74 2,371.40 2,371.40 0.00 0.00 1,325.18 1,325.18 0.00 0.00 42.22 42.22 123,986.56 123,986.56
Mike The Miker迈克 迈克 9814140VCD 9814140VCD 12/1/2018 2018 年 12 月 1 日 12/1/2018 2018 年 12 月 1 日 3,917,751.99 3,917,751.99 419,743.54 419,743.54 36,642.66 36,642.66 344,090.43 344,090.43 0.00 0.00 10.00 10.00 2,434,671.06 2,434,671.06

I've tried nested replaces, but couldn't find a pattern to reliably search without regex which is T/SQL?我已经尝试过嵌套替换,但是如果没有正则表达式(即 T/SQL)就找不到可靠搜索的模式? I've also tried a TOKEN approach in SSIS by this feller , but neither fruitful.我也尝试过这个家伙在 SSIS 中的 TOKEN 方法,但都没有结果。

The nested replace approaches got stuck on the money fields that are under 1,000 (like 0.00) and the SSIS TOKEN approach presumes all fields are quote delimited, which in my example they aren't.嵌套替换方法卡在 1,000(如 0.00)以下的货币字段上,而 SSIS TOKEN 方法假定所有字段都是引号分隔的,在我的示例中它们不是。

As you were told already, TSQL is the wrong tool for this.正如您已经被告知的那样,TSQL 是错误的工具。 Nevertheless this can be done (at least for the set given).尽管如此,这是可以做到的(至少对于给定的集合)。 If this is a one-time action you might give it a try.如果这是一次性操作,您可以尝试一下。 If this is a re-occurring task in a real-life scenario I'd try to get the data in an appropriate format.如果这是在现实生活场景中重复发生的任务,我会尝试以适当的格式获取数据。

However, this would work for the given lines:但是,这适用于给定的行:

DECLARE @t1 TABLE(ID INT IDENTITY, YourString NVARCHAR(1000));
INSERT INTO @t1 VALUES(N'"Pacey LLC.",213830ZZ,11/1/2017,11/1/2017,"297,311.74","2,371.40",0.00,"1,325.18",0.00,42.22,"123,986.56"')
                     ,(N'Mike The Miker,9814140VCD,12/1/2018,12/1/2018,"3,917,751.99","419,743.54","36,642.66","344,090.43",0.00,10.00,"2,434,671.06"');

--Your data includes dates in a culture specific format (something really! bad) --您的数据包括特定文化格式的日期(真的!糟糕)
--Better switch to ISO8601 --更好地切换到ISO8601
--Setting the date format will help, but is NOT recommended --设置日期格式会有所帮助,但不推荐

SET DATEFORMAT dmy;

--the first cte will use APPLY together with a computed TOP() --第一个 cte 将使用APPLY和计算的TOP()
--This will allow to get each single character, one by one. --这将允许一个一个地获取每个字符。

WITH singleChars AS
(                    
SELECT t.ID
      ,A.Pos
      ,SUBSTRING(t.YourString,A.POs,1) AS CharOnPos
FROM @t1 t
CROSS APPLY(SELECT TOP (LEN(t.YourString)) ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values) A(Pos) --master..spt_values can be any table with sufficient rows
)

--we continue with a recursive cte --我们继续递归cte
--it will run through the string and find if we are within a quoted area or not --它将遍历字符串并查找我们是否带引号的区域内

,recCTE AS
(
    SELECT *
          ,CASE WHEN CharOnPos='"' THEN 1 ELSE 0 END AS QuoteIsOpen
          ,CAST(CharOnPos AS NVARCHAR(MAX)) AS GrowingString
    FROM singleChars WHERE Pos=1

    UNION ALL

    SELECT sc.ID,sc.Pos,sc.CharOnPos
          ,A.QuoteIsStillOpen
          ,CONCAT(GrowingString,CASE WHEN sc.CharOnPos=N',' AND A.QuoteIsStillOpen=0 THEN N'$%&' ELSE sc.CharOnPos END)
    FROM singleChars sc
    INNER JOIN recCTE r ON sc.ID = r.ID AND sc.Pos=r.Pos+1 
    CROSS APPLY(VALUES(CASE WHEN sc.CharOnPos='"' THEN CASE WHEN r.QuoteIsOpen=1 THEN 0 ELSE 1 END ELSE r.QuoteIsOpen END )) A(QuoteIsStillOpen)
)

--this CTE performs a trick with TOP 1 WITH TIES together with ORDER BY a partitioned ROW_NUMBER() --此 CTE 使用TOP 1 WITH TIESORDER BY分区ROW_NUMBER()执行技巧
--The result will include the final string of the recursion by ID --结果将包括ID递归的最终字符串

,newlySeparated AS
(
    SELECT TOP 1 WITH TIES * FROM recCTE
    ORDER BY ROW_NUMBER() OVER(PARTITION BY ID ORDER BY Pos DESC)
)

--The final SELECT uses a trick to split strings position- and type-safe --最终的SELECT使用一个技巧来拆分字符串位置和类型安全

SELECT A.*
FROM newlySeparated ns
CROSS APPLY OPENJSON(CONCAT(N'[["',REPLACE(REPLACE(ns.GrowingString,'"',''),'$%&','","'),N'"]]'))
WITH(Company    NVARCHAR(100)        '$[0]'
    ,Code1      NVARCHAR(100)        '$[1]'
    ,Date1      DATE                 '$[2]'
    ,Date2      DATE                 '$[3]'
    ,Decimal1   NVARCHAR(100)        '$[4]' --Using a numbers type might work here, this depends on your machine
    ,Decimal2   NVARCHAR(100)        '$[5]'
    ,Decimal3   NVARCHAR(100)        '$[6]'
    ,Decimal4   NVARCHAR(100)        '$[7]'
    ,Decimal5   NVARCHAR(100)        '$[8]'
    ,Decimal6   NVARCHAR(100)        '$[9]'
    ,Decimal7   NVARCHAR(100)        '$[10]') A
OPTION(MAXRECURSION 0);

The result结果

+----------------+------------+------------+------------+--------------+------------+-----------+------------+------+-------+--------------+
| Pacey LLC.     | 213830ZZ   | 2017-01-11 | 2017-01-11 | 297,311.74   | 2,371.40   | 0.00      | 1,325.18   | 0.00 | 42.22 | 123,986.56   |
+----------------+------------+------------+------------+--------------+------------+-----------+------------+------+-------+--------------+
| Mike The Miker | 9814140VCD | 2018-01-12 | 2018-01-12 | 3,917,751.99 | 419,743.54 | 36,642.66 | 344,090.43 | 0.00 | 10.00 | 2,434,671.06 |
+----------------+------------+------------+------------+--------------+------------+-----------+------------+------+-------+--------------+

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM