简体   繁体   中英

Oracle - How to Capture a value Under an XML TAG

Need Help with the below. If i wanted to capture a value for a particular tag , how do i approach the same ?. for example

I wanted to capture the value(800.422.2762 (US and Canada)) from the highlighted tag .

<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>

in short , I wanted to hardcode this tag to capture the underlying value every time my program runs .

Sample XML:

 <?xml version="1.0" encoding="UTF-8"?>
<pdf2xml producer="popple`enter code here`r" version="0.51.0">
<page number="1" position="absolute" top="0" left="0" height="1188" width="918">
   <fontspec id="0" size="27" family="Helvetica" color="#000000"/>
   <fontspec id="1" size="9" family="Helvetica" color="#000000"/>
   <fontspec id="2" size="9" family="Helvetica" color="#000000"/>
   <fontspec id="3" size="9" family="Times" color="#000000"/>
   <fontspec id="4" size="12" family="Helvetica" color="#000000"/>
   <fontspec id="5" size="12" family="Helvetica" color="#000000"/>
   <fontspec id="6" size="9" family="Helvetica" color="#000000"/>
<image top="27" left="54" width="203" height="108" src="ext-resources\bin\asdf-1_1.jpg"/>
<text top="103" left="346" width="123" height="28" font="0"><b>INVOICE</b></text>
<text top="75" left="611" width="211" height="11" font="1">+1 913.217.6000, Fax +1 913.341.3742</text>
<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>
<text top="102" left="611" width="230" height="11" font="1">headquarters@armaintl.org, www.arma.org</text>
<text top="32" left="611" width="104" height="11" font="1">ARMA International</text>
</page>
</pdf2xml>

So far, i have tried the below approach. I am succeeded in extracting the data but i wanted to extract a particular value based on the hard coded tag. Kindly Help with the approach.

WITH data
     AS (SELECT xmltype (
                   '<?xml version="1.0" encoding="UTF-8"?>
<pdf2xml producer="popple`enter code here`r" version="0.51.0">
<page number="1" position="absolute" top="0" left="0" height="1188" width="918">
   <fontspec id="0" size="27" family="Helvetica" color="#000000"/>
   <fontspec id="1" size="9" family="Helvetica" color="#000000"/>
   <fontspec id="2" size="9" family="Helvetica" color="#000000"/>
   <fontspec id="3" size="9" family="Times" color="#000000"/>
   <fontspec id="4" size="12" family="Helvetica" color="#000000"/>
   <fontspec id="5" size="12" family="Helvetica" color="#000000"/>
   <fontspec id="6" size="9" family="Helvetica" color="#000000"/>
<image top="27" left="54" width="203" height="108" src="ext-resources\bin\asdf-1_1.jpg"/>
<text top="103" left="346" width="123" height="28" font="0"><b>INVOICE</b></text>
<text top="75" left="611" width="211" height="11" font="1">+1 913.217.6000, Fax +1 913.341.3742</text>
<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>
<text top="102" left="611" width="230" height="11" font="1">headquarters@armaintl.org, www.arma.org</text>
<text top="32" left="611" width="104" height="11" font="1">ARMA International</text>
</page>
</pdf2xml>')
                   xmldoc
           FROM DUAL)
SELECT x.*
  FROM data,
       XMLTABLE ('/pdf2xml/page/text'
                 PASSING xmldoc
                 COLUMNS text VARCHAR2 (50) PATH '/text') x
/

Output:

TEXT
--------------------------------------------------
INVOICE
+1 913.217.6000, Fax +1 913.341.3742
800.422.2762 (U.S. and Canada)
headquarters@armaintl.org, www.arma.org
ARMA International

Just change the XQuery from

'/pdf2xml/page/text'

to

'/pdf2xml/page/text[@top=89]'

and the result will be

800.422.2762 (U.S. and Canada)

Or change the query into:

SELECT x.*
FROM data,
   XMLTABLE ('/pdf2xml/page/text'
             PASSING xmldoc
             COLUMNS 
             text VARCHAR2 (50) PATH '/text',
             top  number        PATH '@top',
             left  number       PATH '@left',
             width  number      PATH '@width',
             height  number     PATH '@height',
             font    number     PATH '@font'
             ) x
where x.top = 89
and x.left = 611
and x.width = 177
and x.height = 11
and x.font = 1;

If you only have one source document and only want one node value, you could use XMLQuery instead of XMLTable, with a slightly variation on @wolφi's XPath:

select XMLQuery('/pdf2xml/page/text[@top=89]/text()'
  passing xmldoc
  returning content) as text
from data;

which gives you an XML fragment, or

select XMLQuery('/pdf2xml/page/text[@top=89]/text()'
  passing xmldoc
  returning content).getStringVal() as text
from data;

which gives you a string:

TEXT                          
------------------------------
800.422.2762 (U.S. and Canada)

XMLTable is the way to go if you really have multiple documents or nodes though, of course.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM