I have a table called MyTextstable (myTextsTable_id INT, myTextsTable_text VARCHAR(MAX))
. This table has around 4 million records and I am trying to remove any instance of the ASCII
characters in the following range(s) the VARCHAR(MAX)
column myTextsTable_text
.
I have written the following SQL query, which is taking under 10 minutes on SQL Server 2012, but failed to execute on SQL Server 2008 R2 even after two hours (so I stopped the execution). Please note I have restored the backup of a SQL Server 2008 R2 database on SQL Server 2012 (ie the data is exactly same).
BEGIN TRANSACTION [Tran1]
BEGIN TRY
UPDATE myTextsTable
SET myTextsTable_text = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(myTextsTable_text, CHAR(0), ''), CHAR(1), ''), CHAR(2), ''), CHAR(3), ''), CHAR(4), ''), CHAR(5), ''), CHAR(6), ''), CHAR(7), ''), CHAR(8), ''), CHAR(11), ''), CHAR(12), ''), CHAR(14), ''), CHAR(15), ''), CHAR(16), ''), CHAR(17), ''), CHAR(18), ''), CHAR(19), ''), CHAR(20), ''), CHAR(21), ''), CHAR(22), ''), CHAR(23), ''), CHAR(24), ''), CHAR(25), ''), CHAR(26), ''), CHAR(27), ''), CHAR(28), ''), CHAR(29), ''), CHAR(30), ''), CHAR(31), ''), CHAR(127), '')
WHERE myTextsTable_text LIKE '%[' + CHAR(0) + CHAR(1) + CHAR(2) + CHAR(3) + CHAR(4) + CHAR(5) + CHAR(6) + CHAR(7) + CHAR(8) + CHAR(11) + CHAR(12) + CHAR(14) + CHAR(15) + CHAR(16) + CHAR(17) + CHAR(18) + CHAR(19) + CHAR(20) + CHAR(21) + CHAR(22) + CHAR(23) + CHAR(24) + CHAR(25) + CHAR(26) + CHAR(27) + CHAR(28) + CHAR(29) + CHAR(30) + CHAR(31) + CHAR(127) + ']%';
COMMIT TRANSACTION [Tran1];
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION [Tran1];
--PRINT ERROR_MESSAGE();
END CATCH;
There are only 135 records affected. As the single UPDATE
query wasn't working in SQL Server 2008, I have tried the following approach with a temp table.
BEGIN TRANSACTION [Tran1]
BEGIN TRY
IF OBJECT_ID('tempdb..#myTextsTable') IS NOT NULL DROP TABLE #myTextsTable;
SELECT myTextsTable_id, myTextsTable_text
INTO #myTextsTable
FROM myTextsTable
WHERE myTextsTable_text LIKE '%[' + CHAR(0) + CHAR(1) + CHAR(2) + CHAR(3) + CHAR(4) + CHAR(5) + CHAR(6) + CHAR(7) + CHAR(8) + CHAR(11) + CHAR(12) + CHAR(14) + CHAR(15) + CHAR(16) + CHAR(17) + CHAR(18) + CHAR(19) + CHAR(20) + CHAR(21) + CHAR(22) + CHAR(23) + CHAR(24) + CHAR(25) + CHAR(26) + CHAR(27) + CHAR(28) + CHAR(29) + CHAR(30) + CHAR(31) + CHAR(127) + ']%';
UPDATE #myTextsTable
SET myTextsTable_text = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(myTextsTable_text, CHAR(0), ''), CHAR(1), ''), CHAR(2), ''), CHAR(3), ''), CHAR(4), ''), CHAR(5), ''), CHAR(6), ''), CHAR(7), ''), CHAR(8), ''), CHAR(11), ''), CHAR(12), ''), CHAR(14), ''), CHAR(15), ''), CHAR(16), ''), CHAR(17), ''), CHAR(18), ''), CHAR(19), ''), CHAR(20), ''), CHAR(21), ''), CHAR(22), ''), CHAR(23), ''), CHAR(24), ''), CHAR(25), ''), CHAR(26), ''), CHAR(27), ''), CHAR(28), ''), CHAR(29), ''), CHAR(30), ''), CHAR(31), ''), CHAR(127), '')
UPDATE myTextsTable
SET myTextsTable_text = new.myTextsTable_text
FROM myTextsTable
INNER JOIN #myTextsTable new ON new.myTextsTable_id=myTextsTable.myTextsTable_id
DROP TABLE #myTextsTable;
COMMIT TRANSACTION [Tran1];
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION [Tran1];
--PRINT ERROR_MESSAGE();
END CATCH;
However, the result is same. Works perfectly fine in SQL Server 2012, but not in SQL Server 2008 R2. I found that the UPDATE
query was still executing even after two hours (the records were saved into the temp table ( #myTextsTable
) in a few minutes, I checked this later to make sure which part is taking longer).
As the aforementioned two ways weren't working, I have tried using this using TABLE
variables just to check if it makes any difference, but the result was same (ie works fine in SQL Server 2012 but not in SQL Server 2008 R2)
BEGIN TRANSACTION [Tran1]
BEGIN TRY
DECLARE @myTextsTable TABLE (myTextsTable_id INT, myTextsTable_text VARCHAR(MAX))
INSERT INTO @myTextsTable(myTextsTable_id, myTextsTable_text)
SELECT myTextsTable_id, myTextsTable_text
FROM myTextsTable
WHERE myTextsTable_text LIKE '%[' + CHAR(0) + CHAR(1) + CHAR(2) + CHAR(3) + CHAR(4) + CHAR(5) + CHAR(6) + CHAR(7) + CHAR(8) + CHAR(11) + CHAR(12) + CHAR(14) + CHAR(15) + CHAR(16) + CHAR(17) + CHAR(18) + CHAR(19) + CHAR(20) + CHAR(21) + CHAR(22) + CHAR(23) + CHAR(24) + CHAR(25) + CHAR(26) + CHAR(27) + CHAR(28) + CHAR(29) + CHAR(30) + CHAR(31) + CHAR(127) + ']%';
UPDATE @myTextsTable
SET myTextsTable_text = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(myTextsTable_text, CHAR(0), ''), CHAR(1), ''), CHAR(2), ''), CHAR(3), ''), CHAR(4), ''), CHAR(5), ''), CHAR(6), ''), CHAR(7), ''), CHAR(8), ''), CHAR(11), ''), CHAR(12), ''), CHAR(14), ''), CHAR(15), ''), CHAR(16), ''), CHAR(17), ''), CHAR(18), ''), CHAR(19), ''), CHAR(20), ''), CHAR(21), ''), CHAR(22), ''), CHAR(23), ''), CHAR(24), ''), CHAR(25), ''), CHAR(26), ''), CHAR(27), ''), CHAR(28), ''), CHAR(29), ''), CHAR(30), ''), CHAR(31), ''), CHAR(127), '')
UPDATE myTextsTable
SET myTextsTable_updated = GETDATE()
,myTextsTable_updatedby = 'As per V87058'
,myTextsTable_text = new.myTextsTable_text
FROM myTextsTable
INNER JOIN @myTextsTable new ON new.myTextsTable_id=myTextsTable.myTextsTable_id
COMMIT TRANSACTION [Tran1];
END TRY
BEGIN CATCH
ROLLBACK TRANSACTION [Tran1];
--PRINT ERROR_MESSAGE();
END CATCH;
Could anyone explain why this would happen? How to make this SQL query work in SQL Server 2008 R2?
Note: I know that the string manipulations in database server/layer are not ideal and it would be recommended to do string manipulations in application layer and then save it in DB. But, I am trying to understand why this would be a problem in one version and why not in another version.
SQL Server 2012
Microsoft SQL Server 2012 - 11.0.5058.0 (X64)
Standard Edition (64-bit) on Windows NT 6.3 (Build 9600: ) (Hypervisor)SQL Server 2008 R2
Microsoft SQL Server 2012 - 11.0.5058.0 (X64)
Standard Edition (64-bit) on Windows NT 6.3 (Build 9600: ) (Hypervisor)
This is a known issue on SQL Server 2008 with LOB datatypes and certain collations.
It is easy to reproduce
/*Hangs on 2008*/
DECLARE @VcMax varchar(max)= char(0) + 'a'
SELECT REPLACE(@VcMax COLLATE Latin1_General_CS_AS, char(0), '')
Whilst hung it is CPU bound and seems to be in an infinite loop through these functions.
And the fix is easy too. Either use a non MAX
datatype...
... or a binary collation
/*Doesn't Hang*/
DECLARE @VcMax varchar(max)= char(0) + 'a'
SELECT REPLACE(@VcMax COLLATE Latin1_General_100_BIN2, char(0), '')
For anyone reading this in future, the following ways worked fine.
Way 1. Changing the COLLATION
on the VARCHAR(MAX)
column in the UPDATE SQL
query to BINARY COLLATION
as Martin Smith suggested (please see the accepted answer).
REPLACE(myTextsTable_text COLLATE Latin1_General_100_BIN2, CHAR(0),...
The solution will be as below:
GO
BEGIN TRANSACTION [Tran1]
BEGIN TRY
IF OBJECT_ID('tempdb..#myTextsTable') IS NOT NULL DROP TABLE #myTextsTable;
SELECT myTextsTable_id, myTextsTable_text
INTO #myTextsTable
FROM myTextsTable
WHERE myTextsTable_text LIKE '%[' + CHAR(0) + CHAR(1) + CHAR(2) + CHAR(3) + CHAR(4) + CHAR(5) + CHAR(6) + CHAR(7) + CHAR(8) + CHAR(11) + CHAR(12) + CHAR(14) + CHAR(15) + CHAR(16) + CHAR(17) + CHAR(18) + CHAR(19) + CHAR(20) + CHAR(21) + CHAR(22) + CHAR(23) + CHAR(24) + CHAR(25) + CHAR(26) + CHAR(27) + CHAR(28) + CHAR(29) + CHAR(30) + CHAR(31) + CHAR(127) + ']%';
UPDATE #myTextsTable
SET myTextsTable_text = REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(myTextsTable_text COLLATE Latin1_General_100_BIN2, CHAR(0), ''), CHAR(1), ''), CHAR(2), ''), CHAR(3), ''), CHAR(4), ''), CHAR(5), ''), CHAR(6), ''), CHAR(7), ''), CHAR(8), ''), CHAR(11), ''), CHAR(12), ''), CHAR(14), ''), CHAR(15), ''), CHAR(16), ''), CHAR(17), ''), CHAR(18), ''), CHAR(19), ''), CHAR(20), ''), CHAR(21), ''), CHAR(22), ''), CHAR(23), ''), CHAR(24), ''), CHAR(25), ''), CHAR(26), ''), CHAR(27), ''), CHAR(28), ''), CHAR(29), ''), CHAR(30), ''), CHAR(31), ''), CHAR(127), '')
UPDATE myTextsTable
SET myTextsTable_updated = GETDATE()
,myTextsTable_updatedby = 'As per V87058'
,myTextsTable_text = new.myTextsTable_text
FROM myTextsTable
INNER JOIN #myTextsTable new ON new.myTextsTable_id=myTextsTable.myTextsTable_id
DROP TABLE #myTextsTable;
COMMIT TRANSACTION [Tran1];
END TRY
Way 2: I have created a SQL function
to replace these characters with STUFF
instead of using REPLACE
function.
Note: Please note the SQL function is written to my specific requirement. As such, it only replaces characters in the following range.
--
Go
CREATE FUNCTION [dbo].RemoveASCIICharactersInRange(@InputString VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
IF @InputString IS NOT NULL
BEGIN
DECLARE @Counter INT, @TestString NVARCHAR(40)
SET @TestString = '%[' + NCHAR(0) + NCHAR(1) + NCHAR(2) + NCHAR(3) + NCHAR(4) + NCHAR(5) + NCHAR(6) + NCHAR(7) + NCHAR(8) + NCHAR(11) + NCHAR(12) + NCHAR(14) + NCHAR(15) + NCHAR(16) + NCHAR(17) + NCHAR(18) + NCHAR(19) + NCHAR(20) + NCHAR(21) + NCHAR(22) + NCHAR(23) + NCHAR(24) + NCHAR(25) + NCHAR(26) + NCHAR(27) + NCHAR(28) + NCHAR(29) + NCHAR(30) + NCHAR(31) + NCHAR(127)+ ']%'
SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)
WHILE @Counter <> 0
BEGIN
SELECT @InputString = STUFF(@InputString, @Counter, 1, '')
SELECT @Counter = PATINDEX (@TestString, @InputString COLLATE Latin1_General_BIN)
END
END
RETURN(@InputString)
END
GO
Then, the UPDATE SQL
query (in my temp table approach) will be something like below:
UPDATE #myTextsTable
SET myTextsTable_text = [dbo].RemoveASCIICharactersInRange(#myTextsTable_text)
Go
My personal preferred way would be the first one.
Probably the problem is the nesting in the replace and it is reported on the execution and not the compilación check @@nestlevel function. https://technet.microsoft.com/en-us/library/ms190607(v=sql.105).aspx
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.