简体   繁体   中英

How to select count from XML clob column?

I am trying to select the count of tags in the XML(CLOB) column.

I tried with

select regexp_count(diagram, 'userTask id=', 1, 'c') as "User Tasks",
       regexp_count(diagram, 'task id=', 1, 'c') as "Task"
from process_table

It works but, I need to get output faster than using regexp_count.

I tried with:

select count(xt.task),
count(xt.userTask)
from process_table process
cross join xmltable(
            xmlnamespaces(default 'http://www.omg.org/spec/BPMN/20100524/MODEL'),
            '//definitions/process' passing xmltype(process.diagram) 
            columns
            task varchar2(20) path 'task',
            userTask varchar2(60) path 'userTask'
            ) xt

But, I am getting an error ORA-19279: XPTY0004 - XQuery dynamic type mismatch: expected singleton sequence - got multi-item sequence

I think it would be better to store the XML as XMLTYPE rather than CLOB . Then you can create a virtual column and you can even create indexes on it. Something like this:

CREATE TABLE PROCESS_TABLE (
    DIAGRAM XMLTYPE,
    TASK_COUNT      INTEGER GENERATED ALWAYS AS (
        TO_NUMBER(CAST(XMLQUERY('count(/definitions/process/task)' PASSING BY VALUE "DIAGRAM" RETURNING CONTENT) AS VARCHAR2(100)))
    ) VIRTUAL,
    USER_TASK_COUNT INTEGER  GENERATED ALWAYS AS (
        TO_NUMBER(CAST(XMLQUERY('count(/definitions/process/userTask)' PASSING BY VALUE "DIAGRAM" RETURNING CONTENT) AS VARCHAR2(100)))
    ) VIRTUAL
)
XMLTYPE DIAGRAM STORE AS SECUREFILE BINARY XML; 

CREATE INDEX IND_TASK_COUNT ON process_table (TASK_COUNT);

There are a few ways to do this. One is to use XMLTable to look for either child node, get the node names, and count how many times each appears:

select
  count(case when xt.name = 'userTask' then name end) as userTasks,
  count(case when xt.name = 'task' then name end) as tasks
from process_table process
cross join xmltable(
  xmlnamespaces(default 'http://www.omg.org/spec/BPMN/20100524/MODEL'),
  '//definitions/process/(userTask|task)' passing xmltype(process.diagram) 
  columns
    name varchar2(20) path 'name(.)'
) xt

Or you could use FLWOR expressions to get the count for both child node types at once:

select userTasks, tasks
from process_table process
cross join xmltable(
  xmlnamespaces(default 'http://www.omg.org/spec/BPMN/20100524/MODEL'),
  'let $u := count(//definitions/process/userTask)
   let $s := count(//definitions/process/task)
   return <x><u>{$u}</u><s>{$s}</s></x>'
  passing xmltype(process.diagram) 
  columns
    userTasks number path 'u',
    tasks number path 's'
) xt

but I'm not sure that would end up any faster than the first option.

Another option is to use separate XMLQuery XPath counts for each node to check:

select
  xmlquery('declare default element namespace "http://www.omg.org/spec/BPMN/20100524/MODEL";
      count(//definitions/process/userTask)'
    passing xmltype(diagram)
    returning content) as userTasks,
  xmlquery('declare default element namespace "http://www.omg.org/spec/BPMN/20100524/MODEL";
      count(//definitions/process/task)'
    passing xmltype(diagram)
    returning content) as tasks
from process_table

db<>fiddle with a simple made-up XML CLOB that throws the error you see, and those three approaches (including converting the XMLQuery results to numbers).

I'm interested to see what other people come up with too.


XPath counts returns ORA-06502: PL/SQL: numeric or value error when there are more than 10 rows. When I add where clause works perfectly

When I add id = some_number, or when 10 rows are displayed it works.

This will error if a processed row has a null diagram ; but you can exclude those by just adding where diagram is not null .

db<>fiddle

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM