简体   繁体   中英

Loading data from an XML document (not a file)

I have a lot of logfiles that I want to load into a SQL Server database via a Talend job so that I can preserve the logs without keeping the files in the file system (we generate over 3k log files every day). I do not want to change the jobs to write to the database directly as I don't want the jobs to be dependent on the DB server being available.

I have created a Talend job that can read a log file and convert it to an XML document which I can then send to a SQL Server (2016) stored procedure. The stored procedure is executed, however the contents of the log file are NOT being stored in the database.

I have ruled out the following possible causes:

  • connectivity/permissions - I have logged in to SSMS with the account used by the Talend job and executed the procedure (minus the XML header since I have not found a way to have SSMS allow UTF-8)

The stored procedure is:

ALTER   procedure [myschema].[myProc]
  @logXML as xml,
  @logFile as varchar(max),
  @result as varchar(max) OUT

as

begin
  declare @myAction varchar(max);

  begin try
    begin transaction
      set @myAction = 'inserting log file ' + @logFile;
      insert into [myDB].[mySchema].[myTable]
      (
        JobName,
        LogName,
        RunDate,
        LogLineNum,
        LogLine
      )
      (select logs.value('JobName[1]', 'varchar(500)') as JobName,
              logs.value('LogName[1]', 'varchar(500)') as LogName,
          logs.value('runDate[1]', 'varchar(20)') as runDate,
          lines.value('Number[1]', 'integer') as LogLineNum,
          lines.value('Content[1]', 'varchar(max)') as LogLine
         from @logXML.nodes('/LogFileContents') as l1(logs),
              @logXML.nodes('/LogFileContents/LogLines/Line') as l2(lines)
      );
    commit transaction;
    set @result = 'SUCCESS';
  end try
  begin catch
  if @@TRANCOUNT > 0
    begin
      rollback
      set @result = 'Error ' + @myAction + '.  Error code: ' + @@ERROR;
    end
  end catch
end

The XML document generated by the Talend job looks like the following:

<?xml version="1.0" encoding="UTF-8"?> // I do not have control of this - I get this "free" from Talend
<LogFileContents>
  <JobName>myTask</JobName>
  <LogName>myLogFile</LogName>
  <runDate>YYYYMMDDHHmiss</runDate>
  <LogLines>
    <Line>
      <Number>1</Number>
      <Content>1st Log Message</Content>
    </Line>
    <Line>
      <Number>2</Number>
      <Content>2nd Log Message</Content>
    </Line>
    ...
    <Line>
      <Number>Last</Number>
      <Content>Last Log Message</Content>
    </Line>
  </LogLines>
</LogFileContents>

In Talend I have tried passing the XML document to the tDBSP as a document (which results in a Talend Error). It is currently configured to pass the XML document as a String - I had to add the "sendStringParametersAsUnicode=false" to my connection string (additional parameters) to eliminate the "unable to switch encoding" error. When I check the result from the call to the stored proc I get "SUCCESS". I have even tried having the stored proc pass back a non-sense value for the result instead of "SUCCESS".

In the stored proc I have tried changing the input parameter to a varchar(max) and then converting that to an XML variable via

@logXML as varchar(max),
...
declare @myXML xml = convert(xml, @logXML)
...
  from @myXML.nodes(...) as l1(logs)
       @myXML.nodes(...) as l2(lines)

which also did not load the document into the database.

However if I execute the stored procedure via SSMS using the XML Document (only difference is the missing <?xml...> document header):

<LogFileContents>
  <JobName>myTask</JobName>
  <LogName>myLogFile</LogName>
  <runDate>YYYYMMDDHHmiss</runDate>
  <LogLines>
    <Line>
      <Number>1</Number>
      <Content>1st Log Message</Content>
    </Line>
    <Line>
      <Number>2</Number>
      <Content>2nd Log Message</Content>
    </Line>
    ...
    <Line>
      <Number>Last</Number>
      <Content>Last Log Message</Content>
    </Line>
  </LogLines>
</LogFileContents>

The entire contents loads as expected (The log file I am testing with has 40 lines and all 40 show up in the database, exactly as I expect).

I have also reviewed the following links - but they are in regards to processing XML files, not XML strings.

Parsing XML Data Into SQL Server

Importing XML file into SQL Server 2000 using OPENROWSET

I feel like there's something not quite configured right in Talend but I'm not sure where that would be. Any input/guidance would be greatly appreciated.

Your XML shredding has no issues outside of performance.

Please see below.

So, it seems that the issue is some kind of environment related (Talend?.).

SQL

DECLARE @logXML XML = 
'<?xml version="1.0" encoding="UTF-8"?>
<LogFileContents>
    <JobName>myTask</JobName>
    <LogName>myLogFile</LogName>
    <runDate>YYYYMMDDHHmiss</runDate>
    <LogLines>
        <Line>
            <Number>1</Number>
            <Content>1st Log Message</Content>
        </Line>
        <Line>
            <Number>2</Number>
            <Content>2nd Log Message</Content>
        </Line>...
        <Line>
            <Number>10000</Number>
            <Content>Last Log Message</Content>
        </Line>
    </LogLines>
</LogFileContents>';

SELECT logs.value('(JobName/text())[1]', 'varchar(500)') as JobName
    , logs.value('(LogName/text())[1]', 'varchar(500)') as LogName
    , logs.value('(runDate/text())[1]', 'varchar(20)') as runDate
    , lines.value('(Number/text())[1]', 'integer') as LogLineNum
    , lines.value('(Content/text())[1]', 'varchar(max)') as LogLine
FROM @logXML.nodes('/LogFileContents') as l1(logs)
    , @logXML.nodes('/LogFileContents/LogLines/Line') as l2(lines);

Output

JobName LogName runDate LogLineNum LogLine
myTask myLogFile YYYYMMDDHHmiss 1 1st Log Message
myTask myLogFile YYYYMMDDHHmiss 2 2nd Log Message
myTask myLogFile YYYYMMDDHHmiss 10000 Last Log Message

So it seems that, even though I have a commit in the stored procedure, I still need an external (in Talend) commit (tDBCommit). From what I've been able to gather it has to do with the fact that I use a tDBConnection to establish the DB connection instead of configuring the tDBSP to make a connection to the DB.

Although I haven't tried it, I suspect that I could also set the "Auto Commit" property of the tDBConnection and that would work as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM