繁体   English   中英

sql正则表达式解析文本以添加新行

[英]sql regex parse text to add in new lines

我正在尝试使用一个备注字段,它只是一个很大的文本块,示例数据在下面,就像我将其插入到表格中一样。

create table test_table
(
job_number number,
notes varchar2(4000)
)

insert into test_table (job_number,notes)
values (12345,1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes)

我需要对其进行解析,以便每个注释条目都有单独的记录(注释前面的10位数字是unix时间戳)。 因此,如果我要导出到带分隔符的管道,它将如下所示:

job_number | notes

12345 | 1022089483笔记笔记笔记

12345 | 1022094450笔记笔记笔记

12345 | 1022095218笔记笔记笔记

我真的希望这是有道理的。 我感谢任何见识。

这样做的几种方法:

SQL> insert into test_table (job_number,notes)
  2  values (12345,'1022089483 notes notes notes notes 1022094450 notes notes notes notes 1022095218 notes notes notes notes');

1 row created.

SQL> insert into test_table (job_number,notes)
  2  values (12346,'1022089483 notes notes notes notes 1022094450 foo 1022095218 test notes 1022493228 the answer is 42');

1 row created.

SQL> commit;

Commit complete.

注意:我使用[0-9]{10}作为我的正则表达式来确定注释(即,任何10位数字都被视为注释的开始)。

首先,我们可以采用计算任何给定行中最大便笺数的方法,然后使用该行数进行笛卡尔联接。 然后过滤掉每个音符:

SQL> with data
  2  as (select job_number, notes,
  3            (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  4        from test_table t)
  5  select job_number,
  6         substr(d.notes, regexp_instr(d.notes, '[0-9]{10}', 1, rn.l),
  7                       regexp_instr(d.notes||' 0000000000', '[0-9]{10}', 1, rn.l+1)
  8                       -regexp_instr(d.notes, '[0-9]{10}', 1, rn.l) -1
  9               ) note
 10    from data d
 11         cross join (select rownum l
 12                      from dual
 13                    connect by level <= (select max(num_of_notes)
 14                                           from data)) rn
 15   where rn.l <= d.num_of_notes
 16   order by job_number, rn.l;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

7 rows selected.

只要注释的数量通常相同,就可以了(差异越大,伸缩性越差,因为我们要进行很多递归查找)。

在11g中,我们可以使用递归分解式子查询来执行与上述相同的操作,但不执行额外的循环:

SQL> with data (job_number, notes, note, num_of_notes, iter)
  2  as (select job_number, notes,
  3             substr(notes, regexp_instr(notes, '[0-9]{10}', 1, 1),
  4                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, 2)
  5                    -regexp_instr(notes, '[0-9]{10}', 1, 1) -1
  6                  ),
  7             (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes,
  8             1
  9        from test_table
 10      union all
 11     select job_number, notes,
 12             substr(notes, regexp_instr(notes, '[0-9]{10}', 1, iter+1),
 13                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, iter+2)
 14                    -regexp_instr(notes, '[0-9]{10}', 1, iter+1) -1
 15                  ),
 16             num_of_notes, iter + 1
 17       from data
 18      where substr(notes, regexp_instr(notes, '[0-9]{10}', 1, iter+1),
 19                    regexp_instr(notes||' 0000000000', '[0-9]{10}', 1, iter+2)
 20                    -regexp_instr(notes, '[0-9]{10}', 1, iter+1) -1
 21                  ) is not null
 22    )
 23  select job_number, note
 24    from data
 25  order by job_number, iter;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

7 rows selected.

或者从10g开始,我们可以使用model子句来组成行:

SQL> with data as (select job_number, notes,
  2                       (length(notes)-length(regexp_replace(notes, '[0-9]{10}', null)))/10 num_of_notes
  3                  from test_table)
  4  select job_number, note
  5    from data
  6  model
  7  partition by (job_number)
  8  dimension by (1 as i)
  9  measures (notes, num_of_notes, cast(null as varchar2(4000)) note)
 10  rules
 11  (
 12    note[for i from 1 to num_of_notes[1] increment 1]
 13      = substr(notes[1],
 14               regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)),
 15               regexp_instr(notes[1]||' 0000000000', '[0-9]{10}', 1, cv(i)+1)
 16               -regexp_instr(notes[1], '[0-9]{10}', 1, cv(i)) -1
 17              )
 18  )
 19  order by job_number, i;

JOB_NUMBER NOTE
---------- --------------------------------------------------
     12345 1022089483 notes notes notes notes
     12345 1022094450 notes notes notes notes
     12345 1022095218 notes notes notes notes
     12346 1022089483 notes notes notes notes
     12346 1022094450 foo
     12346 1022095218 test notes
     12346 1022493228 the answer is 42

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM