简体   繁体   中英

SSIS component to remove linefeeds from all (n)varchar fields in dataflow

I use a SSIS component to retrieve data from an API. The API delivers XML, This XML contains data like this:

<tag>
code
</tag>

Linefeeds before and after 'code'

That means that what is entered into my (n)varchar columns in SQL looks like this: CHAR(10)codeCHAR(10)

That messes up a lot, among other things the way things look in the reports. So the CHAR(10) needs to disappear. It needs to be filtered out between the source component and the destination component in my dataflow. I could use derived columns for this but it concerns 9 dataflows with (in total) 385 (n)varchar fields. A lot of work! Is it possible to use a script component that simply does a replace(field,linefeed,'') on each (n)varchar? So that all data passes through this component and exits stripped of linefeeds?

If so, how do I do this? I am pretty ok with SQL, can read most languages but need some help on writing this in a C# of VB.NET. I am designing this in VS2019 and deploying to ADFv2 IS. Targets are SQL Azure databases.

I cannot simply run some SQL after the SSIS job has run since the next run will do a upsert on the data. It will then conclude that CHAR(10)codeCHAR(10) is different from code and insert a new line, violating the unique constraint on code . So that's why I need to do it after getting the data and before writing the data.

You don't necessarily need SSIS to do this. You can use the TRIM function in SQL Server 2017 and later and Azure SQL DB. Insert your XML into a staging table unchanged from the original. Then call a stored procedure that cleans up the XML and optionally inserts it into another table. An example:

--INSERT INTO ...
SELECT 
    TRIM( CHAR(10) + CHAR(13) FROM someXML.value( '(tag/text())[1]', 'VARCHAR(20)' ) ) trimmedTag
FROM xmlTest

TRIM gives you the option to add characters you want to remove. In the above example I've used the CHAR function to remove line feed (10) and carriage return (13) respectively.

From within SSIS you could use an Execute SQL Task to call the proc.

I could not find a solution that would not require extensive coding, at the cost of performance. One solution I tried was writing the XML as a CSV, do a search and replace on the CSV en then import that into a table. Though it worked, it made the solution unnessecary complex and it was detrimental in terms of performance. It was fighting symptoms, not curing the issue. So I went and talked to the guys delivering the API and they removed these linefeeds. That turned out to be the only good solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM