I have a table with a list of invalid characters like:
InVCh
-----
!
"
$
%
&
'
(
)
*
+
,
.
/
Then, i have a lot of tables with different number of columns (all of those columns are string type), example:
Product Store
------- ------
Prod1 Store1
Pr$od!2 Sto$re!2
P:;()ro!!!"d3 S:;()to!!!"re3
I would like to create a procedure that finds all those invalid characters and replace them with a blank space, if there are too many blank space together then i have to replace them with a single one space. So my expected result should be:
Product Store
------- ------
Prod1 Store1
Pr od 2 Sto re 2
P ro d3 S to re3
This is possible?
Thanks!
Since it's SQL Server 2016, using R is an option . This doesn't seem to be so far-fetched, as there's an MSSQLTips article from 2017 that describes this: SQL Server 2016 Regular Expressions with the R Language .
The article's code isn't that hard either :
create table dbo.tblRegEx (id int identity, a varchar(300), b varchar(300) );
-- 3. Remove duplicate words
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"\\b(\\w+\\s*)(\\1\\s*)+";
inData$a <- gsub(pattern, "\\1", inData$a, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, a, b from dbo.tblRegEx'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
with result sets ( as object dbo.tblRegEx);
This question asks for something far easier, just replace some characters.
create table #products
(
id int primary key identity,
product varchar(300),
store varchar(300)
);
go
insert into #products (product,store)
values
('Prod1', 'Store1'),
('Pr$od!2', 'Sto$re!2'),
('P:;()ro!!!"d3', 'S:;()to!!!"re3')
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"[!\"$%&''()*+,./:;]+";
inData$product <- gsub(pattern, " ", inData$product, perl = T );
inData$store <- gsub(pattern, " ", inData$store, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, product, store from #products'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
with result sets ( as object #products);
Like all stored procedures, the results can only be returned to the client, or used as the source for an INSERT INTO
. This could be to a stating or temporary table or a table variable that can be used to update the source table :
declare @outData table (id int primary key, product varchar(300), store varchar(300) );
insert into @outData
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"[!\"$%&''()*+,./:;]+";
inData$product <- gsub(pattern, " ", inData$product, perl = T );
inData$store <- gsub(pattern, " ", inData$store, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, product, store from #products'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
update #products
set product = r.product,
store = r.store
from #products inner join @outdata r on r.id=#products.id
select * from #products
This returns :
id product store
-- ------- --------
1 Prod1 Store1
2 Pr od 2 Sto re 2
3 P ro d3 S to re3
Without the version, I'm assuming you have access to the latest tools. Therefore you could use FOR XML PATH
to create a string on characters that need replacing, and then TRANSLATE
to get rid of them all:
WITH C AS(
SELECT *
FROM (VALUES('!'),
('"'),
('$'),
('%'),
('&'),
(''''),
('('),
(')'),
('*'),
('+'),
(','),
('.'),
('/'))V(InVCh)),
PS AS (
SELECT *
FROM (VALUES('Prod1','Store1'),
('Pr$od!2','Sto$re!2'),
('P:;()ro!!!"d3','S:;()to!!!"re3')) V(Product,Store))
SELECT REPLACE(TRANSLATE(PS.Product,V.C,REPLICATE(LEFT(V.C,1),LEN(V.C))),LEFT(V.C,1),'') AS Product,
REPLACE(TRANSLATE(PS.Store,V.C,REPLICATE(LEFT(V.C,1),LEN(V.C))),LEFT(V.C,1),'') AS Store
FROM PS
CROSS APPLY (VALUES((SELECT '' + InVCh
FROM C
FOR XML PATH(''),TYPE).value('.','varchar(MAX)')))V(C);
Note that the the return value for the 3rd row is 'P:;rod3'
and 'S:;tore3'
, as neither semicolon ( ;
) or colon( :
) are in your list of characters to be removed. YOu'll need to add all the characters you need replacing.
Seems to OP has mentioned, in the comments, that they are using 2016 (why knowing what version you are using is important!). Using Ngrams8K
you could do this (looks messy though):
WITH C AS(
SELECT *
FROM (VALUES('!'),
('"'),
('$'),
('%'),
('&'),
(''''),
('('),
(')'),
('*'),
('+'),
(','),
('.'),
('/'))V(InVCh)),
PS AS (
SELECT *
FROM (VALUES(1,'Prod1','Store1'),
(2,'Pr$od!2','Sto$re!2'),
(3,'P:;()ro!!!"d3','S:;()to!!!"re3')) V(ID,Product,Store))
SELECT PS.Product,V.Product,
PS.Store,V.Store
FROM PS
CROSS APPLY (VALUES((SELECT '' + N.token
FROM dbo.NGrams8k(PS.Product,1) N
WHERE NOT EXISTS (SELECT 1
FROM C
WHERE C.InVCh = N.token)
ORDER BY position
FOR XML PATH(''),TYPE).value('.','varchar(8000)'),
(SELECT '' + N.token
FROM dbo.NGrams8k(PS.Store,1) N
WHERE NOT EXISTS (SELECT 1
FROM C
WHERE C.InVCh = N.token)
ORDER BY position
FOR XML PATH(''),TYPE).value('.','varchar(8000)')))V(Product,Store)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.