[英]Split text into multiple lines based on pipe and cap delimiter - Oracle PL/SQL Pipelined Function
I have a table: 我有一张桌子:
CREATE TABLE "text_file"
( "SEQ" NUMBER,
"SPLIT_VALUE" CLOB
)
The content of the table is: 该表的内容是:
SEQ SPLIT_VALUE
1 MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01
PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^^STATESVILLE^OH^35292|
OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730
OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105
OBX|2|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^172|mg/dl|70_105
2 MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01
PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^^STATESVILLE^OH^35292|
OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730
OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105
OBX|2|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^172|mg/dl|70_105
Please note - the possible segment like MSH, OBR, OBX, LX can be 3 character or 2 characters. 请注意-MSH,OBR,OBX,LX等可能的段可以是3个字符或2个字符。 So, best way would be to get the segment name before the first pipe.
因此,最好的方法是在第一个管道之前获取段名称。
I am looking to split the string in split_value into multiple rows in the following conditions: 我正在寻找在以下情况下将split_value中的字符串拆分为多行:
^
, then it would break down even further, for ex. ^
,则它将进一步分解,例如。 MSH08-01, MSH08-02 Please note - there is an exception for segment MSH.
请注意-MSH段例外。 For MSH - first element is
|
对于MSH-第一个元素是
|
and second one is^~\\&
第二个是
^~\\&
SEQ SPLIT_SEQ SEG_SEQ SPLIT_SEQ_VALUE
1 MSH00 1 MSH
1 MSH01 1 |
1 MSH02 1 ^~\&
1 MSH03 1 GHH LAB
1 MSH04 1 ELAB-3
Please note - I have around 90,000 rows in text_file table. 请注意-我的text_file表中有大约90,000行。 So the solution should be able to process 90,000 efficiently.
因此,该解决方案应该能够有效地处理90,000。
The complete output is: 完整的输出为:
SEQ SPLIT_SEQ SEG_SEQ SPLIT_SEQ_VALUE
1 MSH00 1 MSH
1 MSH01 1 |
1 MSH02 1 ^~\&
1 MSH03 1 GHH LAB
1 MSH04 1 ELAB-3
1 MSH05 1 GHH OE
1 MSH06 1 BLDG4
1 MSH07 1 200202150930
1 MSH08 1
1 MSH09-01 1 ORU
1 MSH09-02 1 R01
1 PID00 1 PID
1 PID01 1
1 PID02 1
1 PID03 1 555-44-4444
1 PID04 1
1 PID05-01 1 EVERYWOMAN
1 PID05-02 1 EVE
1 PID05-03 1 E
1 PID05-04 1
1 PID05-05 1
1 PID05-06 1
1 PID05-07 1 L
1 PID06 1 JONES
1 PID07 1 19620320
1 PID08 1 F
1 PID09 1
1 PID10 1
1 PID11-01 1 153 FERNWOOD DR.
1 PID11-02 1
1 PID11-03 1 STATESVILLE
1 PID11-04 1 OH
1 PID11-05 1 35292
1 PID12 1
1 OBR00 1 OBR
1 OBR01 1 1
1 OBR02-01 1 845439
1 OBR02-02 1 GHH OE
1 OBR03-01 1 1045813
1 OBR03-02 1 GHH LAB
1 OBR04-01 1 15545
1 OBR04-02 1 GLUCOSE
1 OBR05 1
1 OBR06 1
1 OBR07 1 200202150730
1 OBX00 1 OBX
1 OBX01 1 1
1 OBX02 1 SN
1 OBX03-01 1 1554-5
1 OBX03-02 1 GLUCOSE
1 OBX03-03 1 POST 12H CFST:MCNC:PT:SER/PLAS:QN
1 OBX04 1
1 OBX05-01 1
1 OBX05-02 1 182
1 OBX06 1 mg/dl
1 OBX07 1 70_105
1 OBX00 2 OBX
1 OBX01 2 1
1 OBX02 2 SN
1 OBX03-01 2 1554-5
1 OBX03-02 2 GLUCOSE
1 OBX03-03 2 POST 12H CFST:MCNC:PT:SER/PLAS:QN
1 OBX04 2
1 OBX05-01 2
1 OBX05-02 2 182
1 OBX06 2 mg/dl
1 OBX07 2 70_105
2 MSH00 1 MSH
2 MSH01 1 |
2 MSH02 1 ^~\&
2 MSH03 1 GHH LAB
2 MSH04 1 ELAB-3
2 MSH05 1 GHH OE
2 MSH06 1 BLDG4
2 MSH07 1 200202150930
2 MSH08 1
2 MSH09-01 1 ORU
2 MSH09-02 1 R01
2 PID00 1 PID
2 PID01 1
2 PID02 1
2 PID03 1 555-44-4444
2 PID04 1
2 PID05-01 1 EVERYWOMAN
2 PID05-02 1 EVE
2 PID05-03 1 E
2 PID05-04 1
2 PID05-05 1
2 PID05-06 1
2 PID05-07 1 L
2 PID06 1 JONES
2 PID07 1 19620320
2 PID08 1 F
2 PID09 1
2 PID10 1
2 PID11-01 1 153 FERNWOOD DR.
2 PID11-02 1
2 PID11-03 1 STATESVILLE
2 PID11-04 1 OH
2 PID11-05 1 35292
2 PID12 1
2 OBR00 1 OBR
2 OBR01 1 1
2 OBR02-01 1 845439
2 OBR02-02 1 GHH OE
2 OBR03-01 1 1045813
2 OBR03-02 1 GHH LAB
2 OBR04-01 1 15545
2 OBR04-02 1 GLUCOSE
2 OBR05 1
2 OBR06 1
2 OBR07 1 200202150730
2 OBX00 1 OBX
2 OBX01 1 1
2 OBX02 1 SN
2 OBX03-01 1 1554-5
2 OBX03-02 1 GLUCOSE
2 OBX03-03 1 POST 12H CFST:MCNC:PT:SER/PLAS:QN
2 OBX04 1
2 OBX05-01 1
2 OBX05-02 1 182
2 OBX06 1 mg/dl
2 OBX07 1 70_105
2 OBX00 2 OBX
2 OBX01 2 1
2 OBX02 2 SN
2 OBX03-01 2 1554-5
2 OBX03-02 2 GLUCOSE
2 OBX03-03 2 POST 12H CFST:MCNC:PT:SER/PLAS:QN
2 OBX04 2
2 OBX05-01 2
2 OBX05-02 2 182
2 OBX06 2 mg/dl
2 OBX07 2 70_105
I believe that in as plsql pipelined function would be the best way. 我相信,作为plsql流水线函数将是最好的方法。
Any help would be appreciated. 任何帮助,将不胜感激。
It is PL/SQL and assuming your string can be of arbitrary length as well (ie more than 32K); 它是PL / SQL,并假设您的字符串也可以具有任意长度(即大于32K); you should use a table function along with dbms_lob package to parse it and then return multiple rows.
您应该将表函数与dbms_lob包一起使用以对其进行解析,然后返回多行。
Blob Journey from Web to DB is a general article that shows how to manipulate blobs from web point of view. 从Web到DB的Blob Journey是一篇通用文章,它显示了如何从Web的角度操纵Blob。 But approach there is the same.
但是方法是一样的。 See the section around [Selecting Data].
请参阅[选择数据]周围的部分。 This is simply splitting at 4000 bytes but your split logic will have to take into account the |.
这只是拆分为4000字节,但您的拆分逻辑将必须考虑|。 Idea is same though.
想法是一样的。
Then later on see the [table] usage along with PL/SQL 然后稍后再看[table]用法以及PL / SQL
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.