简体   繁体   English

从Oracle CLOB字段将XML提取到行中

[英]Extract XML into Rows from Oracle CLOB Field

I am trying to extract xml into aa table output separated by rows. 我试图将xml提取到以行分隔的表输出中。

The data is a CLOB field in Oracle Database as follows: 数据是Oracle数据库中的CLOB字段,如下所示:

<emailInfo>
 <recipientList>
  <recipientName>ATS</recipientName>
  <recipientEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  <contactEmailList>
   <emailAddress>wp2@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>pw@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>

 <recipientList>
  <recipientName>ERG</recipientName>
  <recipientEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>sl@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
  <escalationEmailList>
   <emailAddress>sl2@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>
</emailInfo>

EDIT2: My updated SQL query is as follows: EDIT2:我更新的SQL查询如下:

             SELECT t.*, m.*, p.*, l.*
             FROM cisadm.F1_ext_lookup_val exval,

                  XMLTABLE ('/emailInfo/recipientList'
                     PASSING XMLTYPE (exval.bo_data_area)
                     COLUMNS recipient_name                VARCHAR2 (4000)  PATH 'recipientName',
                             recipient_email_list          XMLTYPE          PATH '/recipientEmailList',
                             contact_email_list            XMLTYPE          PATH '/contactEmailList',
                             escalation_email_list         XMLTYPE          PATH '/escalationEmailList') t,
                  XMLTABLE ('/recipientEmailList'
                     PASSING (t.recipient_email_list)
                     COLUMNS recipient_email_address       VARCHAR2 (4000)  PATH '/emailAddress',
                             rec_email_status_flg          VARCHAR2 (10)    PATH '/statusFlag') m,
                  XMLTABLE ('/contactEmailList'
                     PASSING (t.contact_email_list)
                     COLUMNS contact_email_address         VARCHAR2 (4000)  PATH 'contactEmailList/emailAddress',
                             contact_email_status_flg      VARCHAR2 (10)    PATH 'contactEmailList/statusFlag'
                             ) p,
                  XMLTABLE('/escalationEmailList'
                     PASSING (t.escalation_email_list)
                     COLUMNS     esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                                 esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
                      ) l

I am trying to provision for the fact that there may be multiple values for each Recipient email list, contact email list, and escalation email list. 我正在尝试规定以下事实:每个收件人电子邮件列表,联系人电子邮件列表和升级电子邮件列表可能有多个值。

Sample output should be: 样本输出应为:

样品输出

Any help would be so appreciated! 任何帮助将不胜感激!

For future readers, here are general-purpose solutions in open-source programming to migrate XML data from a CLOB field into csv tabular format. 对于将来的读者来说,这是开源编程中的通用解决方案,可以将XML数据从CLOB字段迁移到csv表格格式。

Using the OP's data needs, these approaches are not dependent on any RDMS and hence can be used in other database connections. 使用OP的数据需求,这些方法不依赖于任何RDMS,因此可以在其他数据库连接中使用。 Additionally, limitations of SQL are overcome as various nuances like xpaths, arrays, loops can be used: 此外,由于可以使用xpath,数组,循环等各种细微差别,因此可以克服SQL的局限性:

Python (using cx_Oracle ): Python (使用cx_Oracle ):

#!/usr/bin/python
import os
import cx_Oracle
import csv
import lxml.etree as ET

# SET DIRECTORY PATH
cd = os.path.dirname(os.path.abspath(__file__))

# DB CONNECTION AND QUERY
db = cx_Oracle.connect("uid/pwd@database")    
cur = db.cursor()
clob = cur.execute("SELECT CLOBfield FROM OracleTable").fetchone()

# CLOSE CURSOR AND DATABASE
cur.close()
db.close()

# PARSE XML CONTENT
dom = ET.fromstring(clob)

# DEFINING COLUMNS
columns = ['RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
           'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
           'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG']

emailnodes = ['recipientEmailList', 'contactEmailList', 'escalationEmailList']

# OPEN CSV FILE
with open(os.path.join(cd,'CLOB_Py.csv'), 'w', newline='') as m:
    writer = csv.writer(m)    
    writer.writerow(columns)

    nodexpath = dom.xpath('//recipientList')

    dataline = []    
    for j in range(1,len(nodexpath)+1):

        dataline = []        
        dataline.append(dom.xpath('//recipientList[{0}]/recipientName'.format(j))[0].text)

        for n in emailnodes:   
            # EMAILS
            childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[1]'.format(j, n))            

            # APPEND DATA LINES   
            for elem in childxpath:
                dataline.append(elem.text)

            if childxpath == []:
                dataline.append('')

            # FLAGS
            childxpath = dom.xpath('//recipientList[{0}]/{1}[1]/*[2]'.format(j, n))

            # APPEND DATA LINES   
            for elem in childxpath:
                dataline.append(elem.text)

            if childxpath == []:
                dataline.append('')

        writer.writerow(dataline)

PHP (using PDO Oracle OCI ) PHP (使用PDO Oracle OCI

// Set Directory Path
$cd = dirname(__FILE__);

// Opening db connection
$db_username = "your_username";
$db_password = "your_password";
$db = "oci:dbname=your_sid";

try {
    $dbh = new PDO($db,$db_username,$db_password);          
    $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

    $sql = "SELECT CLOBfield FROM OracleTable";    
    $STH = $dbh->query($sql);    
    $clob = $STH->fetch();
}

catch(PDOException $e) {  
    echo $e->getMessage();
    exit;
}

# Closing db connection
$dbh = null;

// Loading XML source
$xpath = simplexml_load_string($clob);

// Writing column headers
$columns = array('RECIPENT_NAME', 'RECIPIENT_EMAIL_ADDRESS', 'REC_EMAIL_STATUS_FLG',
                 'CONTACT_EMAIL_ADDRESS', 'CONTACT_EMAIL_STATUS_FLG',
                 'ESC_EMAIL_ADDRESS', 'ESC_EMAIL_STATUS_FLG');

$emailnodes = array('recipientEmailList', 'contactEmailList', 'escalationEmailList');

$fs = fopen($cd.'/CLOB_PHP.csv', 'w');
fputcsv($fs, $columns);      
fclose($fs);    

// Writing data lines
$i = 1;
$values = [];
$node = $xpath->xpath('//recipientList');    

foreach ($node as $n){

     $child = $xpath->xpath('//recipientList['. $i .']/recipientName');
     foreach($child as $value) {            
          $values[] = $value;         
     }

     foreach ($emailnodes as $e){

          // EMAILS       
          $child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[1]');

          if (count($child) > 0) {
              foreach($child as $value) {           
                 $values[] = $value;         
              }
          }   
          else {
                 $values[] = '';
          }

          // FLAGS
          $child = $xpath->xpath('//recipientList['. $i .']/'. $e.'[1]/*[2]');

          if (count($child) > 0) {
              foreach($child as $value) {           
                 $values[] = $value;         
              }
          }   
          else {
                 $values[] = '';
          }
     }  

     $fs = fopen($cd.'/CLOB_PHP.csv', 'a');
     fputcsv($fs, $values);      
     fclose($fs);  

     $values = [];
     $i++;

}

R (using ROracle ): R (使用ROracle ):

library(XML)
library(ROracle)

setwd("C:\\Path\\To\\R\\Script")

# OPEN DATABASE AND QUERY
conn <-dbConnect(drv, username = "", password = "", dbname = "")
clobdf <- dbGetQuery(conn, "SELECT CLOBfield FROM OracleTable;")
dbDisconnect(conn)

# READ IN EXTERNAL DATA FILE
doc<-xmlParse(clobdf[[1,1]])

emailnodes <- c('recipientEmailList', 'contactEmailList', 'escalationEmailList')

# EXTRACT NODE VALUES INTO LISTS
recipientNamesList <- xpathSApply(doc, paste0("//recipientList/recipientName"), xmlValue)

for (e in emailnodes){
    assign(e, xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[1]"), xmlValue))
}

for (e in emailnodes){
  assign(paste0(e, "flg"), xpathSApply(doc, paste0("//recipientList/", e, "[1]/*[2]"), xmlValue))
}

# COMBINE LISTS TO DATA FRAME
xmldf<- data.frame(RECIPENT_NAME =  matrix(unlist(recipientNamesList), nrow=2, byrow=T),
                   RECIPIENT_EMAIL_ADDRESS = matrix(unlist(recipientEmailList), nrow=2, byrow=T),
                   REC_EMAIL_STATUS_FLG  = matrix(unlist(recipientEmailListflg), nrow=2, byrow=T),
                   CONTACT_EMAIL_ADDRESS = matrix(unlist(contactEmailList),   nrow=2, byrow=T),                
                   CONTACT_EMAIL_STATUS_FLG = matrix(unlist(contactEmailListflg),   nrow=2, byrow=T),                
                   ESC_EMAIL_ADDRESS = matrix(unlist(escalationEmailList), nrow=2, byrow=T),
                   ESC_EMAIL_STATUS_FLG = matrix(unlist(escalationEmailListflg), nrow=2, byrow=T))   

# OUTPUT TO CSV
write.csv(xmldf, "CLOB_R.csv", na = "", row.names=FALSE)

This query returns the data as in screenshot - 该查询返回数据,如屏幕截图所示-

select 
    extractvalue(s.column_value, '/*/recipientName') as recipient_name,
    extractvalue(s.column_value, '/*/recipientEmailList/emailAddress') as recipient_email_address,
    extractvalue(s.column_value, '/*/recipientEmailList/statusFlag') as rec_email_status_flg,
    extractvalue(s.column_value, '/*/contactEmailList/emailAddress') as contact_email_address,
    extractvalue(s.column_value, '/*/contactEmailList/statusFlag') as contact_email_status_flg,
    extractvalue(s.column_value, '/*/escalationEmailList/emailAddress') as esc_email_address,
    extractvalue(s.column_value, '/*/escalationEmailList/statusFlag') as esc_email_status_flg
from  tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s

and this query extract each email on a separate line - 然后此查询在单独的行中提取每封电子邮件-

select recipient_name, email_address, status_flag
 from
(
    select 
           recipient_name,
           extractvalue(x.column_value, '/*/emailAddress') as email_address,
           extractvalue(x.column_value, '/*/statusFlag') as status_flag
    from
    (
        select 
            extractvalue(s.column_value, '/*/recipientName') as recipient_name,
            EXTRACT(s.column_value, '/*') recipients
        from  tmp, table(xmlsequence(EXTRACT(XMLTYPE(tmp.bo_data_area), '/emailInfo/recipientList'))) s
    ) v, table(xmlsequence(EXTRACT(v.recipients, '/*/*'))) x
)
where (email_address is not null or status_flag is not null)

You may try xmltable 您可以尝试xmltable

SELECT *
    FROM XMLTable('/emailInfo/recipientList' PASSING XMLTYPE('<emailInfo>
 <recipientList>
  <recipientName>ATS</recipientName>
  <recipientEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>wp@act.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>pw@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>

 <recipientList>
  <recipientName>ERG</recipientName>
  <recipientEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </recipientEmailList>
  <contactEmailList>
   <emailAddress>erg@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </contactEmailList>
  <escalationEmailList>
   <emailAddress>sl@wp.com.au</emailAddress>
   <statusFlag>F1AC</statusFlag>
  </escalationEmailList>
 </recipientList>
</emailInfo>')
                  COLUMNS recipient_name            VARCHAR2(4000)   PATH 'recipientName',
                          recipient_email_address   VARCHAR2(4000)   PATH 'recipientEmailList/emailAddress',
                          rec_email_status_flg      VARCHAR2(10)     PATH 'recipientEmailList/statusFlag',
                          contact_email_address     VARCHAR2(4000)   PATH 'contactEmailList/emailAddress',
                          contact_email_status_flg  VARCHAR2(10)     PATH 'contactEmailList/statusFlag',
                          esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                          esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
) t

Same from table 从表相同

SELECT *
    FROM tmp,XMLTable('/emailInfo/recipientList' PASSING XMLTYPE(tmp.bo_data_area)
                  COLUMNS recipient_name            VARCHAR2(4000)   PATH 'recipientName',
                          recipient_email_address   VARCHAR2(4000)   PATH 'recipientEmailList/emailAddress',
                          rec_email_status_flg      VARCHAR2(10)     PATH 'recipientEmailList/statusFlag',
                          contact_email_address     VARCHAR2(4000)   PATH 'contactEmailList/emailAddress',
                          contact_email_status_flg  VARCHAR2(10)     PATH 'contactEmailList/statusFlag',
                          esc_email_address         VARCHAR2(4000)   PATH 'escalationEmailList/emailAddress',
                          esc_email_status_flg      VARCHAR2(10)     PATH 'escalationEmailList/statusFlag'
) t

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM