[英]Extract XML into Rows from Oracle CLOB Field
I am trying to extract xml into aa table output separated by rows. 我试图将xml提取到以行分隔的表输出中。
The data is a CLOB field in Oracle Database as follows: 数据是Oracle数据库中的CLOB字段,如下所示:
<emailInfo>
<recipientList>
<recipientName>ATS</recipientName>
<recipientEmailList>
<emailAddress>wp@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</recipientEmailList>
<contactEmailList>
<emailAddress>wp@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
<contactEmailList>
<emailAddress>wp2@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</contactEmailList>
<escalationEmailList>
<emailAddress>pw@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
</recipientList>
<recipientList>
<recipientName>ERG</recipientName>
<recipientEmailList>
<emailAddress>erg@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</recipientEmailList>
<contactEmailList>
<emailAddress>erg@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</contactEmailList>
<escalationEmailList>
<emailAddress>sl@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
<escalationEmailList>
<emailAddress>sl2@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
</recipientList>
</emailInfo>
EDIT2: My updated SQL query is as follows: EDIT2:我更新的SQL查询如下:
SELECT t.*, m.*, p.*, l.*
FROM cisadm.F1_ext_lookup_val exval,
XMLTABLE ('/emailInfo/recipientList'
PASSING XMLTYPE (exval.bo_data_area)
COLUMNS recipient_name VARCHAR2 (4000) PATH 'recipientName',
recipient_email_list XMLTYPE PATH '/recipientEmailList',
contact_email_list XMLTYPE PATH '/contactEmailList',
escalation_email_list XMLTYPE PATH '/escalationEmailList') t,
XMLTABLE ('/recipientEmailList'
PASSING (t.recipient_email_list)
COLUMNS recipient_email_address VARCHAR2 (4000) PATH '/emailAddress',
rec_email_status_flg VARCHAR2 (10) PATH '/statusFlag') m,
XMLTABLE ('/contactEmailList'
PASSING (t.contact_email_list)
COLUMNS contact_email_address VARCHAR2 (4000) PATH 'contactEmailList/emailAddress',
contact_email_status_flg VARCHAR2 (10) PATH 'contactEmailList/statusFlag'
) p,
XMLTABLE('/escalationEmailList'
PASSING (t.escalation_email_list)
COLUMNS esc_email_address VARCHAR2(4000) PATH 'escalationEmailList/emailAddress',
esc_email_status_flg VARCHAR2(10) PATH 'escalationEmailList/statusFlag'
) l
I am trying to provision for the fact that there may be multiple values for each Recipient email list, contact email list, and escalation email list. 我正在尝试规定以下事实:每个收件人电子邮件列表,联系人电子邮件列表和升级电子邮件列表可能有多个值。
Sample output should be: 样本输出应为:
Any help would be so appreciated! 任何帮助将不胜感激!
For future readers, here are general-purpose solutions in open-source programming to migrate XML data from a CLOB field into csv tabular format. 对于将来的读者来说,这是开源编程中的通用解决方案,可以将XML数据从CLOB字段迁移到csv表格格式。
Using the OP's data needs, these approaches are not dependent on any RDMS and hence can be used in other database connections. 使用OP的数据需求,这些方法不依赖于任何RDMS,因此可以在其他数据库连接中使用。 Additionally, limitations of SQL are overcome as various nuances like xpaths, arrays, loops can be used: 此外,由于可以使用xpath,数组,循环等各种细微差别,因此可以克服SQL的局限性:
Python (using cx_Oracle ): Python (使用cx_Oracle ):
#!/usr/bin/python
import os
import cx_Oracle
import csv
import lxml.etree as ET
# SET DIRECTORY PATH
cd = os.path.dirname(os.path.abspath(__file__))
# DB CONNECTION AND QUERY
db = cx_Oracle.connect("uid/pwd@database")
cur = db.cursor()
clob = cur.execute("SELECT CLOBfield FROM OracleTable").fetchone()
# CLOSE CURSOR AND DATABASE
cur.close()
db.close()
# PARSE XML CONTENT
dom = ET.fromstring(clob)
# DEFINING COLUMNS
columns = ['RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG']
emailnodes = ['recipientEmailList', 'contactEmailList', 'escalationEmailList']
# OPEN CSV FILE
with open(os.path.join(cd,'CLOB_Py.csv'), 'w', newline='') as m:
writer = csv.writer(m)
writer.writerow(columns)
nodexpath = dom.xpath('//recipientList')
dataline = []
for j in range(1,len(nodexpath)+1):
dataline = []
dataline.append(dom.xpath('//recipientList[{0}]/recipientName'.format(j))[0].text)
for n in emailnodes:
# EMAILS
childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[1]'.format(j, n))
# APPEND DATA LINES
for elem in childxpath:
dataline.append(elem.text)
if childxpath == []:
dataline.append('')
# FLAGS
childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[2]'.format(j, n))
# APPEND DATA LINES
for elem in childxpath:
dataline.append(elem.text)
if childxpath == []:
dataline.append('')
writer.writerow(dataline)
PHP (using PDO Oracle OCI ) PHP (使用PDO Oracle OCI )
// Set Directory Path
$cd = dirname(__FILE__);
// Opening db connection
$db_username = "your_username";
$db_password = "your_password";
$db = "oci:dbname=your_sid";
try {
$dbh = new PDO($db,$db_username,$db_password);
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sql = "SELECT CLOBfield FROM OracleTable";
$STH = $dbh->query($sql);
$clob = $STH->fetch();
}
catch(PDOException $e) {
echo $e->getMessage();
exit;
}
# Closing db connection
$dbh = null;
// Loading XML source
$xpath = simplexml_load_string($clob);
// Writing column headers
$columns = array('RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG');
$emailnodes = array('recipientEmailList', 'contactEmailList', 'escalationEmailList');
$fs = fopen($cd.'/CLOB_PHP.csv', 'w');
fputcsv($fs, $columns);
fclose($fs);
// Writing data lines
$i = 1;
$values = [];
$node = $xpath->xpath('//recipientList');
foreach ($node as $n){
$child = $xpath->xpath('//recipientList['. $i .']/recipientName');
foreach($child as $value) {
$values[] = $value;
}
foreach ($emailnodes as $e){
// EMAILS
$child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[1]');
if (count($child) > 0) {
foreach($child as $value) {
$values[] = $value;
}
}
else {
$values[] = '';
}
// FLAGS
$child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[2]');
if (count($child) > 0) {
foreach($child as $value) {
$values[] = $value;
}
}
else {
$values[] = '';
}
}
$fs = fopen($cd.'/CLOB_PHP.csv', 'a');
fputcsv($fs, $values);
fclose($fs);
$values = [];
$i++;
}
R (using ROracle ): R (使用ROracle ):
library(XML)
library(ROracle)
setwd("C:\\Path\\To\\R\\Script")
# OPEN DATABASE AND QUERY
conn <-dbConnect(drv, username = "", password = "", dbname = "")
clobdf <- dbGetQuery(conn, "SELECT CLOBfield FROM OracleTable;")
dbDisconnect(conn)
# READ IN EXTERNAL DATA FILE
doc<-xmlParse(clobdf[[1,1]])
emailnodes <- c('recipientEmailList', 'contactEmailList', 'escalationEmailList')
# EXTRACT NODE VALUES INTO LISTS
recipientNamesList <- xpathSApply(doc, paste0("//recipientList/recipientName"), xmlValue)
for (e in emailnodes){
assign(e, xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[1]"), xmlValue))
}
for (e in emailnodes){
assign(paste0(e, "flg"), xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[2]"), xmlValue))
}
# COMBINE LISTS TO DATA FRAME
xmldf<- data.frame(RECIPENT_NAME = matrix(unlist(recipientNamesList), nrow=2, byrow=T),
RECIPIENT_EMAIL_ADDRESS = matrix(unlist(recipientEmailList), nrow=2, byrow=T),
REC_EMAIL_STATUS_FLG = matrix(unlist(recipientEmailListflg), nrow=2, byrow=T),
CONTACT_EMAIL_ADDRESS = matrix(unlist(contactEmailList), nrow=2, byrow=T),
CONTACT_EMAIL_STATUS_FLG = matrix(unlist(contactEmailListflg), nrow=2, byrow=T),
ESC_EMAIL_ADDRESS = matrix(unlist(escalationEmailList), nrow=2, byrow=T),
ESC_EMAIL_STATUS_FLG = matrix(unlist(escalationEmailListflg), nrow=2, byrow=T))
# OUTPUT TO CSV
write.csv(xmldf, "CLOB_R.csv", na = "", row.names=FALSE)
This query returns the data as in screenshot - 该查询返回数据,如屏幕截图所示-
select
extractvalue(s.column_value, '/*/recipientName') as recipient_name,
extractvalue(s.column_value, '/*/recipientEmailList/emailAddress') as recipient_email_address,
extractvalue(s.column_value, '/*/recipientEmailList/statusFlag') as rec_email_status_flg,
extractvalue(s.column_value, '/*/contactEmailList/emailAddress') as contact_email_address,
extractvalue(s.column_value, '/*/contactEmailList/statusFlag') as contact_email_status_flg,
extractvalue(s.column_value, '/*/escalationEmailList/emailAddress') as esc_email_address,
extractvalue(s.column_value, '/*/escalationEmailList/statusFlag') as esc_email_status_flg
from tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s
and this query extract each email on a separate line - 然后此查询在单独的行中提取每封电子邮件-
select recipient_name, email_address, status_flag
from
(
select
recipient_name,
extractvalue(x.column_value, '/*/emailAddress') as email_address,
extractvalue(x.column_value, '/*/statusFlag') as status_flag
from
(
select
extractvalue(s.column_value, '/*/recipientName') as recipient_name,
EXTRACT(s.column_value, '/*') recipients
from tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s
) v, table(xmlsequence(EXTRACT(v.recipients, '/*/*'))) x
)
where (email_address is not null or status_flag is not null)
You may try xmltable 您可以尝试xmltable
SELECT *
FROM XMLTable('/emailInfo/recipientList' PASSING XMLTYPE('<emailInfo>
<recipientList>
<recipientName>ATS</recipientName>
<recipientEmailList>
<emailAddress>wp@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</recipientEmailList>
<contactEmailList>
<emailAddress>wp@act.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</contactEmailList>
<escalationEmailList>
<emailAddress>pw@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
</recipientList>
<recipientList>
<recipientName>ERG</recipientName>
<recipientEmailList>
<emailAddress>erg@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</recipientEmailList>
<contactEmailList>
<emailAddress>erg@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</contactEmailList>
<escalationEmailList>
<emailAddress>sl@wp.com.au</emailAddress>
<statusFlag>F1AC</statusFlag>
</escalationEmailList>
</recipientList>
</emailInfo>')
COLUMNS recipient_name VARCHAR2(4000) PATH 'recipientName',
recipient_email_address VARCHAR2(4000) PATH 'recipientEmailList/emailAddress',
rec_email_status_flg VARCHAR2(10) PATH 'recipientEmailList/statusFlag',
contact_email_address VARCHAR2(4000) PATH 'contactEmailList/emailAddress',
contact_email_status_flg VARCHAR2(10) PATH 'contactEmailList/statusFlag',
esc_email_address VARCHAR2(4000) PATH 'escalationEmailList/emailAddress',
esc_email_status_flg VARCHAR2(10) PATH 'escalationEmailList/statusFlag'
) t
Same from table 从表相同
SELECT *
FROM tmp,XMLTable('/emailInfo/recipientList' PASSING XMLTYPE(tmp.bo_data_area)
COLUMNS recipient_name VARCHAR2(4000) PATH 'recipientName',
recipient_email_address VARCHAR2(4000) PATH 'recipientEmailList/emailAddress',
rec_email_status_flg VARCHAR2(10) PATH 'recipientEmailList/statusFlag',
contact_email_address VARCHAR2(4000) PATH 'contactEmailList/emailAddress',
contact_email_status_flg VARCHAR2(10) PATH 'contactEmailList/statusFlag',
esc_email_address VARCHAR2(4000) PATH 'escalationEmailList/emailAddress',
esc_email_status_flg VARCHAR2(10) PATH 'escalationEmailList/statusFlag'
) t
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.