简体   繁体   中英

Search characters in a String and replace it with a blank space T-SQL

I have a table with a list of invalid characters like:

InVCh
-----

!
"
$
%
&
'
(
)
*
+
,
.
/

Then, i have a lot of tables with different number of columns (all of those columns are string type), example:

Product          Store
-------          ------
Prod1            Store1
Pr$od!2          Sto$re!2
P:;()ro!!!"d3    S:;()to!!!"re3

I would like to create a procedure that finds all those invalid characters and replace them with a blank space, if there are too many blank space together then i have to replace them with a single one space. So my expected result should be:

Product          Store
-------          ------
Prod1            Store1
Pr od 2          Sto re 2
P ro d3          S to re3

This is possible?

Thanks!

Since it's SQL Server 2016, using R is an option . This doesn't seem to be so far-fetched, as there's an MSSQLTips article from 2017 that describes this: SQL Server 2016 Regular Expressions with the R Language .

The article's code isn't that hard either :

create table dbo.tblRegEx (id int identity, a varchar(300), b  varchar(300) );

-- 3. Remove duplicate words
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"\\b(\\w+\\s*)(\\1\\s*)+";
inData$a <- gsub(pattern, "\\1", inData$a, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, a, b from dbo.tblRegEx'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
with result sets ( as object dbo.tblRegEx);

This question asks for something far easier, just replace some characters.

create table #products 
(
    id int primary key identity, 
    product varchar(300), 
    store  varchar(300) 
);
go

insert into #products (product,store)
values 
('Prod1',            'Store1'),
('Pr$od!2',          'Sto$re!2'),
('P:;()ro!!!"d3',    'S:;()to!!!"re3')

exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"[!\"$%&''()*+,./:;]+";
inData$product <- gsub(pattern, " ", inData$product, perl = T );
inData$store <- gsub(pattern, " ", inData$store, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, product, store from #products'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
with result sets ( as object #products);

Like all stored procedures, the results can only be returned to the client, or used as the source for an INSERT INTO . This could be to a stating or temporary table or a table variable that can be used to update the source table :

declare @outData table (id int primary key, product varchar(300), store  varchar(300) );

insert into @outData
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"[!\"$%&''()*+,./:;]+";   
inData$product <- gsub(pattern, " ", inData$product, perl = T );
inData$store <- gsub(pattern, " ", inData$store, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, product, store from #products'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData' 



update #products
set product = r.product,
    store   = r.store
from #products inner join @outdata r on r.id=#products.id

select * from #products

This returns :

id  product   store
--  -------   --------
1   Prod1     Store1
2   Pr od 2   Sto re 2
3   P ro d3   S to re3

Without the version, I'm assuming you have access to the latest tools. Therefore you could use FOR XML PATH to create a string on characters that need replacing, and then TRANSLATE to get rid of them all:

WITH C AS(
    SELECT *
    FROM (VALUES('!'),
                ('"'),
                ('$'),
                ('%'),
                ('&'),
                (''''),
                ('('),
                (')'),
                ('*'),
                ('+'),
                (','),
                ('.'),
                ('/'))V(InVCh)),
PS AS (
    SELECT *
    FROM (VALUES('Prod1','Store1'),
                ('Pr$od!2','Sto$re!2'),
                ('P:;()ro!!!"d3','S:;()to!!!"re3')) V(Product,Store))
SELECT REPLACE(TRANSLATE(PS.Product,V.C,REPLICATE(LEFT(V.C,1),LEN(V.C))),LEFT(V.C,1),'') AS Product,
        REPLACE(TRANSLATE(PS.Store,V.C,REPLICATE(LEFT(V.C,1),LEN(V.C))),LEFT(V.C,1),'') AS Store
FROM PS
     CROSS APPLY (VALUES((SELECT '' + InVCh
                          FROM C
                          FOR XML PATH(''),TYPE).value('.','varchar(MAX)')))V(C);

db<>fiddle

Note that the the return value for the 3rd row is 'P:;rod3' and 'S:;tore3' , as neither semicolon ( ; ) or colon( : ) are in your list of characters to be removed. YOu'll need to add all the characters you need replacing.

Seems to OP has mentioned, in the comments, that they are using 2016 (why knowing what version you are using is important!). Using Ngrams8K you could do this (looks messy though):

WITH C AS(
    SELECT *
    FROM (VALUES('!'),
                ('"'),
                ('$'),
                ('%'),
                ('&'),
                (''''),
                ('('),
                (')'),
                ('*'),
                ('+'),
                (','),
                ('.'),
                ('/'))V(InVCh)),
PS AS (
    SELECT *
    FROM (VALUES(1,'Prod1','Store1'),
                (2,'Pr$od!2','Sto$re!2'),
                (3,'P:;()ro!!!"d3','S:;()to!!!"re3')) V(ID,Product,Store))
SELECT PS.Product,V.Product,
       PS.Store,V.Store
FROM PS
     CROSS APPLY (VALUES((SELECT '' + N.token
                          FROM dbo.NGrams8k(PS.Product,1) N
                          WHERE NOT EXISTS (SELECT 1
                                            FROM C
                                            WHERE C.InVCh = N.token)
                          ORDER BY position
                          FOR XML PATH(''),TYPE).value('.','varchar(8000)'),
                         (SELECT '' + N.token
                          FROM dbo.NGrams8k(PS.Store,1) N
                          WHERE NOT EXISTS (SELECT 1
                                            FROM C
                                            WHERE C.InVCh = N.token)
                          ORDER BY position
                          FOR XML PATH(''),TYPE).value('.','varchar(8000)')))V(Product,Store)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM