Need Help with the below. If i wanted to capture a value for a particular tag , how do i approach the same ?. for example
I wanted to capture the value(800.422.2762 (US and Canada)) from the highlighted tag .
<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>
in short , I wanted to hardcode this tag to capture the underlying value every time my program runs .
Sample XML:
<?xml version="1.0" encoding="UTF-8"?>
<pdf2xml producer="popple`enter code here`r" version="0.51.0">
<page number="1" position="absolute" top="0" left="0" height="1188" width="918">
<fontspec id="0" size="27" family="Helvetica" color="#000000"/>
<fontspec id="1" size="9" family="Helvetica" color="#000000"/>
<fontspec id="2" size="9" family="Helvetica" color="#000000"/>
<fontspec id="3" size="9" family="Times" color="#000000"/>
<fontspec id="4" size="12" family="Helvetica" color="#000000"/>
<fontspec id="5" size="12" family="Helvetica" color="#000000"/>
<fontspec id="6" size="9" family="Helvetica" color="#000000"/>
<image top="27" left="54" width="203" height="108" src="ext-resources\bin\asdf-1_1.jpg"/>
<text top="103" left="346" width="123" height="28" font="0"><b>INVOICE</b></text>
<text top="75" left="611" width="211" height="11" font="1">+1 913.217.6000, Fax +1 913.341.3742</text>
<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>
<text top="102" left="611" width="230" height="11" font="1">headquarters@armaintl.org, www.arma.org</text>
<text top="32" left="611" width="104" height="11" font="1">ARMA International</text>
</page>
</pdf2xml>
So far, i have tried the below approach. I am succeeded in extracting the data but i wanted to extract a particular value based on the hard coded tag. Kindly Help with the approach.
WITH data
AS (SELECT xmltype (
'<?xml version="1.0" encoding="UTF-8"?>
<pdf2xml producer="popple`enter code here`r" version="0.51.0">
<page number="1" position="absolute" top="0" left="0" height="1188" width="918">
<fontspec id="0" size="27" family="Helvetica" color="#000000"/>
<fontspec id="1" size="9" family="Helvetica" color="#000000"/>
<fontspec id="2" size="9" family="Helvetica" color="#000000"/>
<fontspec id="3" size="9" family="Times" color="#000000"/>
<fontspec id="4" size="12" family="Helvetica" color="#000000"/>
<fontspec id="5" size="12" family="Helvetica" color="#000000"/>
<fontspec id="6" size="9" family="Helvetica" color="#000000"/>
<image top="27" left="54" width="203" height="108" src="ext-resources\bin\asdf-1_1.jpg"/>
<text top="103" left="346" width="123" height="28" font="0"><b>INVOICE</b></text>
<text top="75" left="611" width="211" height="11" font="1">+1 913.217.6000, Fax +1 913.341.3742</text>
<text top="89" left="611" width="177" height="11" font="1">800.422.2762 (U.S. and Canada)</text>
<text top="102" left="611" width="230" height="11" font="1">headquarters@armaintl.org, www.arma.org</text>
<text top="32" left="611" width="104" height="11" font="1">ARMA International</text>
</page>
</pdf2xml>')
xmldoc
FROM DUAL)
SELECT x.*
FROM data,
XMLTABLE ('/pdf2xml/page/text'
PASSING xmldoc
COLUMNS text VARCHAR2 (50) PATH '/text') x
/
Output:
TEXT
--------------------------------------------------
INVOICE
+1 913.217.6000, Fax +1 913.341.3742
800.422.2762 (U.S. and Canada)
headquarters@armaintl.org, www.arma.org
ARMA International
Just change the XQuery from
'/pdf2xml/page/text'
to
'/pdf2xml/page/text[@top=89]'
and the result will be
800.422.2762 (U.S. and Canada)
Or change the query into:
SELECT x.*
FROM data,
XMLTABLE ('/pdf2xml/page/text'
PASSING xmldoc
COLUMNS
text VARCHAR2 (50) PATH '/text',
top number PATH '@top',
left number PATH '@left',
width number PATH '@width',
height number PATH '@height',
font number PATH '@font'
) x
where x.top = 89
and x.left = 611
and x.width = 177
and x.height = 11
and x.font = 1;
If you only have one source document and only want one node value, you could use XMLQuery instead of XMLTable, with a slightly variation on @wolφi's XPath:
select XMLQuery('/pdf2xml/page/text[@top=89]/text()'
passing xmldoc
returning content) as text
from data;
which gives you an XML fragment, or
select XMLQuery('/pdf2xml/page/text[@top=89]/text()'
passing xmldoc
returning content).getStringVal() as text
from data;
which gives you a string:
TEXT
------------------------------
800.422.2762 (U.S. and Canada)
XMLTable is the way to go if you really have multiple documents or nodes though, of course.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.