搜索字符串中的字符并将其替换为空格T-SQL

Question

I have a table with a list of invalid characters like: 我有一张带有无效字符列表的表，例如：

InVCh
-----

!
"
$
%
&
'
(
)
*
+
,
.
/

Then, i have a lot of tables with different number of columns (all of those columns are string type), example: 然后，我有很多具有不同列数的表（所有这些列都是字符串类型），例如：

Product          Store
-------          ------
Prod1            Store1
Pr$od!2          Sto$re!2
P:;()ro!!!"d3    S:;()to!!!"re3

I would like to create a procedure that finds all those invalid characters and replace them with a blank space, if there are too many blank space together then i have to replace them with a single one space. 我想创建一个查找所有那些无效字符并将其替换为空格的过程，如果在一起的空白太多，则必须用一个空格替换它们。 So my expected result should be: 所以我的预期结果应该是：

Product          Store
-------          ------
Prod1            Store1
Pr od 2          Sto re 2
P ro d3          S to re3

This is possible? 这个有可能？

Thanks! 谢谢！

Answer 1

Since it's SQL Server 2016, using R is an option . 由于是SQL Server 2016，因此可以使用R。 This doesn't seem to be so far-fetched, as there's an MSSQLTips article from 2017 that describes this: SQL Server 2016 Regular Expressions with the R Language . 这似乎并不牵强，因为2017年有一篇MSSQLTips文章对此进行了描述：带有R语言的SQL Server 2016正则表达式。

The article's code isn't that hard either : 文章的代码也不难：

create table dbo.tblRegEx (id int identity, a varchar(300), b  varchar(300) );

-- 3. Remove duplicate words
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"\\b(\\w+\\s*)(\\1\\s*)+";
inData$a <- gsub(pattern, "\\1", inData$a, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, a, b from dbo.tblRegEx'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
with result sets ( as object dbo.tblRegEx);

This question asks for something far easier, just replace some characters. 这个问题要求简单得多，只需替换一些字符即可。

create table #products 
(
    id int primary key identity, 
    product varchar(300), 
    store  varchar(300) 
);
go

insert into #products (product,store)
values 
('Prod1',            'Store1'),
('Pr$od!2',          'Sto$re!2'),
('P:;()ro!!!"d3',    'S:;()to!!!"re3')

exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"[!\"$%&''()*+,./:;]+";
inData$product <- gsub(pattern, " ", inData$product, perl = T );
inData$store <- gsub(pattern, " ", inData$store, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, product, store from #products'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData'
with result sets ( as object #products);

Like all stored procedures, the results can only be returned to the client, or used as the source for an INSERT INTO . 与所有存储过程一样，结果只能返回到客户端，或用作INSERT INTO的源。 This could be to a stating or temporary table or a table variable that can be used to update the source table : 这可能是一个表或临时表或一个表变量，可用于更新源表：

declare @outData table (id int primary key, product varchar(300), store  varchar(300) );

insert into @outData
exec sp_execute_external_script @language=N'R'
, @script = N'
pattern <-"[!\"$%&''()*+,./:;]+";   
inData$product <- gsub(pattern, " ", inData$product, perl = T );
inData$store <- gsub(pattern, " ", inData$store, perl = T );
outData <- inData;'
, @input_data_1 = N'select id, product, store from #products'
, @input_data_1_name = N'inData'
, @output_data_1_name=N'outData' 



update #products
set product = r.product,
    store   = r.store
from #products inner join @outdata r on r.id=#products.id

select * from #products

This returns : 返回：

id  product   store
--  -------   --------
1   Prod1     Store1
2   Pr od 2   Sto re 2
3   P ro d3   S to re3

Answer 2

Without the version, I'm assuming you have access to the latest tools. 如果没有该版本，我假设您可以使用最新工具。 Therefore you could use FOR XML PATH to create a string on characters that need replacing, and then TRANSLATE to get rid of them all: 因此，您可以使用FOR XML PATH在需要替换的字符上创建一个字符串，然后使用TRANSLATE除去所有字符：

WITH C AS(
    SELECT *
    FROM (VALUES('!'),
                ('"'),
                ('$'),
                ('%'),
                ('&'),
                (''''),
                ('('),
                (')'),
                ('*'),
                ('+'),
                (','),
                ('.'),
                ('/'))V(InVCh)),
PS AS (
    SELECT *
    FROM (VALUES('Prod1','Store1'),
                ('Pr$od!2','Sto$re!2'),
                ('P:;()ro!!!"d3','S:;()to!!!"re3')) V(Product,Store))
SELECT REPLACE(TRANSLATE(PS.Product,V.C,REPLICATE(LEFT(V.C,1),LEN(V.C))),LEFT(V.C,1),'') AS Product,
        REPLACE(TRANSLATE(PS.Store,V.C,REPLICATE(LEFT(V.C,1),LEN(V.C))),LEFT(V.C,1),'') AS Store
FROM PS
     CROSS APPLY (VALUES((SELECT '' + InVCh
                          FROM C
                          FOR XML PATH(''),TYPE).value('.','varchar(MAX)')))V(C);

db<>fiddle 分贝<>小提琴

Note that the the return value for the 3rd row is 'P:;rod3' and 'S:;tore3' , as neither semicolon ( ; ) or colon( : ) are in your list of characters to be removed. 注意，对于第三排的返回值是'P:;rod3'和'S:;tore3' ，既不分号（ ; ）或冒号（ : ）在你的人物的名单中删除。 YOu'll need to add all the characters you need replacing. 您需要添加所有需要替换的字符。

Seems to OP has mentioned, in the comments, that they are using 2016 (why knowing what version you are using is important!). OP似乎在评论中提到他们正在使用2016（为什么知道您使用的版本很重要！）。 Using Ngrams8K you could do this (looks messy though): 使用Ngrams8K可以做到这一点（虽然看起来很凌乱）：

WITH C AS(
    SELECT *
    FROM (VALUES('!'),
                ('"'),
                ('$'),
                ('%'),
                ('&'),
                (''''),
                ('('),
                (')'),
                ('*'),
                ('+'),
                (','),
                ('.'),
                ('/'))V(InVCh)),
PS AS (
    SELECT *
    FROM (VALUES(1,'Prod1','Store1'),
                (2,'Pr$od!2','Sto$re!2'),
                (3,'P:;()ro!!!"d3','S:;()to!!!"re3')) V(ID,Product,Store))
SELECT PS.Product,V.Product,
       PS.Store,V.Store
FROM PS
     CROSS APPLY (VALUES((SELECT '' + N.token
                          FROM dbo.NGrams8k(PS.Product,1) N
                          WHERE NOT EXISTS (SELECT 1
                                            FROM C
                                            WHERE C.InVCh = N.token)
                          ORDER BY position
                          FOR XML PATH(''),TYPE).value('.','varchar(8000)'),
                         (SELECT '' + N.token
                          FROM dbo.NGrams8k(PS.Store,1) N
                          WHERE NOT EXISTS (SELECT 1
                                            FROM C
                                            WHERE C.InVCh = N.token)
                          ORDER BY position
                          FOR XML PATH(''),TYPE).value('.','varchar(8000)')))V(Product,Store)

搜索字符串中的字符并将其替换为空格T-SQL

问题描述

2 个解决方案

解决方案1
2 已采纳 2019-04-05 09:08:44

解决方案2
-1 2019-04-05 08:39:20

搜索字符串中的字符并将其替换为空格T-SQL

问题描述

2 个解决方案

解决方案1 2 已采纳 2019-04-05 09:08:44

解决方案2 -1 2019-04-05 08:39:20

解决方案1
2 已采纳 2019-04-05 09:08:44

解决方案2
-1 2019-04-05 08:39:20