简体   繁体   English

根据管道和上限分隔符将文本分成多行-Oracle PL / SQL Pipelined Function

[英]Split text into multiple lines based on pipe and cap delimiter - Oracle PL/SQL Pipelined Function

I have a table: 我有一张桌子:

 CREATE TABLE "text_file"
( "SEQ" NUMBER,
"SPLIT_VALUE" CLOB
)

The content of the table is: 该表的内容是:

SEQ       SPLIT_VALUE
1         MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01
          PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^^STATESVILLE^OH^35292|
          OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730
          OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105
          OBX|2|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^172|mg/dl|70_105

2         MSH|^~\&|GHH LAB|ELAB-3|GHH OE|BLDG4|200202150930||ORU^R01
          PID|||555-44-4444||EVERYWOMAN^EVE^E^^^^L|JONES|19620320|F|||153 FERNWOOD DR.^^STATESVILLE^OH^35292|
          OBR|1|845439^GHH OE|1045813^GHH LAB|15545^GLUCOSE|||200202150730
          OBX|1|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^182|mg/dl|70_105
          OBX|2|SN|1554-5^GLUCOSE^POST 12H CFST:MCNC:PT:SER/PLAS:QN||^172|mg/dl|70_105

Please note - the possible segment like MSH, OBR, OBX, LX can be 3 character or 2 characters. 请注意-MSH,OBR,OBX,LX等可能的段可以是3个字符或2个字符。 So, best way would be to get the segment name before the first pipe. 因此,最好的方法是在第一个管道之前获取段名称。

I am looking to split the string in split_value into multiple rows in the following conditions: 我正在寻找在以下情况下将split_value中的字符串拆分为多行:

  • SEQ -- it would pick from the first column SEQ-从第一列中选​​择
  • SPLIT_SEQ -- it would split based on the first word before |, for ex. SPLIT_SEQ-例如,它将基于|之前的第一个单词进行拆分。 MSH, OBR, OBX, LX followed by sequence starting from 00. If there is a cap ^ , then it would break down even further, for ex. MSH,OBR,OBX,LX,其后是从00开始的序列。如果存在上限^ ,则它将进一步分解,例如。 MSH08-01, MSH08-02 MSH08-01,MSH08-02

Please note - there is an exception for segment MSH. 请注意-MSH段例外。 For MSH - first element is | 对于MSH-第一个元素是| and second one is ^~\\& 第二个是^~\\&

SEQ SPLIT_SEQ   SEG_SEQ SPLIT_SEQ_VALUE
1   MSH00       1       MSH
1   MSH01       1       |
1   MSH02       1       ^~\&
1   MSH03       1       GHH LAB
1   MSH04       1       ELAB-3
  • SEG_SEQ -- if the segment, the first word before | SEG_SEQ-如果是句段,|前的第一个单词 is repeated in the same SEQ, then increase it. 在相同的SEQ中重复一次,然后增加。 So, if OBX is twice, then first OBX values would be 1 and for second OBX, it would be 2 and so on 因此,如果OBX是两倍,则第一个OBX值为1,第二个OBX则为2,依此类推
  • SPLIT_SEQ_VALUE -- The value from the message above would be specified here. SPLIT_SEQ_VALUE-将在此处指定以上消息的值。

Please note - I have around 90,000 rows in text_file table. 请注意-我的text_file表中有大约90,000行。 So the solution should be able to process 90,000 efficiently. 因此,该解决方案应该能够有效地处理90,000。

The complete output is: 完整的输出为:

SEQ SPLIT_SEQ   SEG_SEQ SPLIT_SEQ_VALUE
1   MSH00       1       MSH
1   MSH01       1       |
1   MSH02       1       ^~\&
1   MSH03       1       GHH LAB
1   MSH04       1       ELAB-3
1   MSH05       1       GHH OE
1   MSH06       1       BLDG4
1   MSH07       1       200202150930
1   MSH08       1       
1   MSH09-01    1       ORU
1   MSH09-02    1       R01
1   PID00       1       PID
1   PID01       1       
1   PID02       1       
1   PID03       1       555-44-4444
1   PID04       1       
1   PID05-01    1       EVERYWOMAN
1   PID05-02    1       EVE
1   PID05-03    1       E
1   PID05-04    1   
1   PID05-05    1   
1   PID05-06    1   
1   PID05-07    1       L
1   PID06       1       JONES
1   PID07       1       19620320
1   PID08       1       F
1   PID09       1       
1   PID10       1       
1   PID11-01    1       153 FERNWOOD DR.
1   PID11-02    1   
1   PID11-03    1       STATESVILLE
1   PID11-04    1       OH
1   PID11-05    1       35292
1   PID12       1   
1   OBR00       1       OBR
1   OBR01       1       1
1   OBR02-01    1       845439
1   OBR02-02    1       GHH OE
1   OBR03-01    1       1045813
1   OBR03-02    1       GHH LAB
1   OBR04-01    1       15545
1   OBR04-02    1       GLUCOSE
1   OBR05       1   
1   OBR06       1   
1   OBR07       1       200202150730
1   OBX00       1       OBX
1   OBX01       1       1
1   OBX02       1       SN
1   OBX03-01    1       1554-5
1   OBX03-02    1       GLUCOSE
1   OBX03-03    1       POST 12H CFST:MCNC:PT:SER/PLAS:QN
1   OBX04       1       
1   OBX05-01    1       
1   OBX05-02    1       182
1   OBX06       1       mg/dl
1   OBX07       1       70_105
1   OBX00       2       OBX
1   OBX01       2       1
1   OBX02       2       SN
1   OBX03-01    2       1554-5
1   OBX03-02    2       GLUCOSE
1   OBX03-03    2       POST 12H CFST:MCNC:PT:SER/PLAS:QN
1   OBX04       2           
1   OBX05-01    2       
1   OBX05-02    2       182
1   OBX06       2       mg/dl
1   OBX07       2       70_105

2   MSH00       1       MSH
2   MSH01       1       |
2   MSH02       1       ^~\&
2   MSH03       1       GHH LAB
2   MSH04       1       ELAB-3
2   MSH05       1       GHH OE
2   MSH06       1       BLDG4
2   MSH07       1       200202150930
2   MSH08       1       
2   MSH09-01    1       ORU
2   MSH09-02    1       R01
2   PID00       1       PID
2   PID01       1       
2   PID02       1       
2   PID03       1       555-44-4444
2   PID04       1       
2   PID05-01    1       EVERYWOMAN
2   PID05-02    1       EVE
2   PID05-03    1       E
2   PID05-04    1   
2   PID05-05    1   
2   PID05-06    1   
2   PID05-07    1       L
2   PID06       1       JONES
2   PID07       1       19620320
2   PID08       1       F
2   PID09       1       
2   PID10       1       
2   PID11-01    1       153 FERNWOOD DR.
2   PID11-02    1   
2   PID11-03    1       STATESVILLE
2   PID11-04    1       OH
2   PID11-05    1       35292
2   PID12       1   
2   OBR00       1       OBR
2   OBR01       1       1
2   OBR02-01    1       845439
2   OBR02-02    1       GHH OE
2   OBR03-01    1       1045813
2   OBR03-02    1       GHH LAB
2   OBR04-01    1       15545
2   OBR04-02    1       GLUCOSE
2   OBR05       1   
2   OBR06       1   
2   OBR07       1       200202150730
2   OBX00       1       OBX
2   OBX01       1       1
2   OBX02       1       SN
2   OBX03-01    1       1554-5
2   OBX03-02    1       GLUCOSE
2   OBX03-03    1       POST 12H CFST:MCNC:PT:SER/PLAS:QN
2   OBX04       1       
2   OBX05-01    1       
2   OBX05-02    1       182
2   OBX06       1       mg/dl
2   OBX07       1       70_105
2   OBX00       2       OBX
2   OBX01       2       1
2   OBX02       2       SN
2   OBX03-01    2       1554-5
2   OBX03-02    2       GLUCOSE
2   OBX03-03    2       POST 12H CFST:MCNC:PT:SER/PLAS:QN
2   OBX04       2           
2   OBX05-01    2       
2   OBX05-02    2       182
2   OBX06       2       mg/dl
2   OBX07       2       70_105

I believe that in as plsql pipelined function would be the best way. 我相信,作为plsql流水线函数将是最好的方法。

Any help would be appreciated. 任何帮助,将不胜感激。

It is PL/SQL and assuming your string can be of arbitrary length as well (ie more than 32K); 它是PL / SQL,并假设您的字符串也可以具有任意长度(即大于32K); you should use a table function along with dbms_lob package to parse it and then return multiple rows. 您应该将表函数与dbms_lob包一起使用以对其进行解析,然后返回多行。

Blob Journey from Web to DB is a general article that shows how to manipulate blobs from web point of view. 从Web到DB的Blob Journey是一篇通用文章,它显示了如何从Web的角度操纵Blob。 But approach there is the same. 但是方法是一样的。 See the section around [Selecting Data]. 请参阅[选择数据]周围的部分。 This is simply splitting at 4000 bytes but your split logic will have to take into account the |. 这只是拆分为4000字节,但您的拆分逻辑将必须考虑|。 Idea is same though. 想法是一样的。

Then later on see the [table] usage along with PL/SQL 然后稍后再看[table]用法以及PL / SQL

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM