[英]SQL SERVER generate data using Regex pattern
I would like to generate the data by given regex pattern in SQL Server
.我想通过
SQL Server
给定的正则表达式模式生成数据。 Is there any possibility to do?有没有可能做? Say, I have pattern as below and I would like to generate data as follow:
说,我有如下模式,我想生成如下数据:
The idea behind the concept is SQL STATIC DATA MASKING (which was removed in current feature ).该概念背后的想法是SQL STATIC DATA MASKING (在当前功能中已删除)。 Our client wants to mask the production data in test database.
我们的客户想要屏蔽测试数据库中的生产数据。 We don't have SQL STATIC DATA MASKING feature with sql now but we have patterns to mask the column, so what I am thinking is, with these pattern we can run the update query.
我们现在没有 sql 的 SQL STATIC DATA MASKING 功能,但是我们有模式来屏蔽列,所以我在想的是,使用这些模式我们可以运行更新查询。
SELECT "(\d){7}" AS RandonNumber, "(\W){5}" AS RandomString FROM tbl
Output Should be输出应该是
+---------------+--------------+
| RandonNumber | RandomString |
+---------------+--------------+
| 7894562 | AHJIL |
+---------------+--------------+
| 9632587 | ZLOKP |
+---------------+--------------+
| 4561238 | UJIOK |
+---------------+--------------+
Apart from this regular pattern, I have some customized pattern like Test_Product_(\\d){1,4}
, which should give result as below:除了这种常规模式之外,我还有一些自定义模式,如
Test_Product_(\\d){1,4}
,它应该给出如下结果:
Test_Product_012
Test_Product_143
Test_Product_8936
Complete Patterns which I am going to use for masking我将用于遮罩的完整图案
Other Patterns Samples
(\l){30} ahukoklijfahukokponmahukoahuko
(\d){7} 7895623
(\W){5} ABCDEF
Test_Product_(\d){1,4} Test_Product_007
0\.(\d){2} 0.59
https://www\.(\l){10}\.com https://www.anything.com
I'm not convinced you need a Regex for this.我不相信你需要一个正则表达式。 Why not just use a "scrub script" and take advantage of the
newid()
function to generate a bunch of random data.为什么不直接使用“
newid()
脚本”并利用newid()
函数来生成一堆随机数据。 It looks like you'll need to write such a script anyway, Regex or not, and this has the benefit of being very simple.看起来您无论如何都需要编写这样的脚本,无论是否使用正则表达式,而且这样做的好处是非常简单。
Let's say you start with the following data:假设您从以下数据开始:
create table tbl (PersonalId int, Name varchar(max))
insert into tbl select 300300, 'Michael'
insert into tbl select 554455, 'Tim'
insert into tbl select 228899, 'John'
select * from tbl
Then run your script:然后运行你的脚本:
update tbl set PersonalId = cast(rand(checksum(newid())) * 1000000 as int)
update tbl set Name = left(convert(varchar(255), newid()), 6)
select * from tbl
Well, I can give you a solution that is not based on regular expressions, but on a set of parameters - but it contains a complete set of all your requirements.好吧,我可以给你一个解决方案,它不是基于正则表达式,而是基于一组参数——但它包含了你所有要求的完整集合。
I've based this solution on a user-defined function I've written to generate random strings ( You can read my blog post about it here ) - I've just changed it so that it could generate the mask you wanted based on the following conditions:我将此解决方案基于我编写的用于生成随机字符串的用户定义函数( 您可以在此处阅读我的博客文章)-我刚刚对其进行了更改,以便它可以根据以下条件:
I've decided these set of rules based on your update to the question, containing your desired masks:我已经根据您对问题的更新决定了这些规则集,其中包含您想要的掩码:
(\\d){7} 7895623 (\\W){5} ABCDEF Test_Product_(\\d){1,4} Test_Product_007 0\\.(\\d){2} 0.59 https://www\\.(\\l){10}\\.com https://www.anything.com
And now, for the code:现在,对于代码:
Since I'm using a user-defined function, I can't use inside it the NewId()
built in function - so we first need to create a view to generate the guid for us:由于我使用的是用户定义的函数,我不能在它内部使用
NewId()
内置函数 - 所以我们首先需要创建一个视图来为我们生成 guid:
CREATE VIEW GuidGenerator
AS
SELECT Newid() As NewGuid;
In the function, we're going to use that view to generate a NewID()
as the base of all randomness.在函数中,我们将使用该视图生成一个
NewID()
作为所有随机性的基础。
The function itself is a lot more cumbersome then the random string generator I've started from:该函数本身比我开始使用的随机字符串生成器要麻烦得多:
CREATE FUNCTION dbo.MaskGenerator
(
-- use null or an empty string for no prefix
@Prefix nvarchar(4000),
-- use null or an empty string for no suffix
@suffix nvarchar(4000),
-- the minimum length of the random part
@MinLength int,
-- the maximum length of the random part
@MaxLength int,
-- the maximum number of rows to return. Note: up to 1,000,000 rows
@Count int,
-- 1, 2 and 4 stands for lower-case, upper-case and digits.
-- a bitwise combination of these values can be used to generate all possible combinations:
-- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
@CharType tinyint
)
RETURNS TABLE
AS
RETURN
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)), -- 10
E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000
SELECT TOP(@Count)
n As Number,
CONCAT(@Prefix, (
SELECT TOP (Length)
-- choose what char combination to use for the random part
CASE @CharType
WHEN 1 THEN Lower
WHEN 2 THEN Upper
WHEN 3 THEN IIF(Rnd % 2 = 0, Lower, Upper)
WHEN 4 THEN Digit
WHEN 5 THEN IIF(Rnd % 2 = 0, Lower, Digit)
WHEN 6 THEN IIF(Rnd % 2 = 0, Upper, Digit)
WHEN 7 THEN
CASE Rnd % 3
WHEN 0 THEN Lower
WHEN 1 THEN Upper
ELSE Digit
END
END
FROM Tally As t0
-- create a random number from the guid using the GuidGenerator view
CROSS APPLY (SELECT Abs(Checksum(NewGuid)) As Rnd FROM GuidGenerator) As rand
CROSS APPLY
(
-- generate a random lower-case char, upper-case char and digit
SELECT CHAR(97 + Rnd % 26) As Lower, -- Random lower case letter
CHAR(65 + Rnd % 26) As Upper,-- Random upper case letter
CHAR(48 + Rnd % 10) As Digit -- Random digit
) As Chars
WHERE t0.n <> -t1.n -- Needed for the subquery to get re-evaluated for each row
FOR XML PATH('')
), @Suffix) As RandomString
FROM Tally As t1
CROSS APPLY
(
-- Select a random length between @MinLength and @MaxLength (inclusive)
SELECT TOP 1 n As Length
FROM Tally As t2
CROSS JOIN GuidGenerator
WHERE t2.n >= @MinLength
AND t2.n <= @MaxLength
AND t2.n <> t1.n
ORDER BY NewGuid
) As Lengths;
(\\l){30} - ahukoklijfahukokponmahukoahuko
SELECT RandomString FROM dbo.MaskGenerator(null, null, 30, 30, 2, 1);
Results:结果:
1, eyrutkzdugogyhxutcmcmplvzofser
2, juuyvtzsvmmcdkngnzipvsepviepsp
(\\d){7} - 7895623
SELECT RandomString FROM dbo.MaskGenerator(null, null, 7, 7, 2, 4);
Results:结果:
1, 8744412
2, 2275313
(\\W){5} - ABCDE
SELECT RandomString FROM dbo.MaskGenerator(null, null, 5, 5, 2, 2);
Results:结果:
1, RSYJE
2, MMFAA
Test_Product_(\\d){1,4} - Test_Product_007
SELECT RandomString FROM dbo.MaskGenerator('Test_Product_', null, 1, 4, 2, 4);
Results:结果:
1, Test_Product_933
2, Test_Product_7
0\\.(\\d){2} - 0.59
SELECT RandomString FROM dbo.MaskGenerator('0.', null, 2, 2, 2, 4);
Results:结果:
1, 0.68
2, 0.70
https://www\\.(\\l){10}\\.com - https://www.anything.com
SELECT RandomString FROM dbo.MaskGenerator('https://www.', '.com', 10, 10, 2, 1);
Results:结果:
1, https://www.xayvkmkuci.com
2, https://www.asbfcvomax.com
Here's how you use it to mask the content of a table:以下是您如何使用它来屏蔽表的内容:
DECLARE @Count int = 10;
SELECT CAST(IntVal.RandomString As Int) As IntColumn,
UpVal.RandomString as UpperCaseValue,
LowVal.RandomString as LowerCaseValue,
MixVal.RandomString as MixedValue,
WithPrefix.RandomString As PrefixedValue
FROM dbo.MaskGenerator(null, null, 3, 7, @Count, 4) As IntVal
JOIN dbo.MaskGenerator(null, null, 10, 10, @Count, 1) As LowVal
ON IntVal.Number = LowVal.Number
JOIN dbo.MaskGenerator(null, null, 5, 10, @Count, 2) As UpVal
ON IntVal.Number = UpVal.Number
JOIN dbo.MaskGenerator(null, null, 10, 20, @Count, 7) As MixVal
ON IntVal.Number = MixVal.Number
JOIN dbo.MaskGenerator('Test ', null, 1, 4, @Count, 4) As WithPrefix
ON IntVal.Number = WithPrefix.Number
Results:结果:
IntColumn UpperCaseValue LowerCaseValue MixedValue PrefixedValue
674 CCNVSDI esjyyesesv O2FAC7bfwg2Be5a91Q0 Test 4935
30732 UJKSL jktisddbnq 7o8B91Sg1qrIZSvG3AcL Test 0
4669472 HDLJNBWPJ qgtfkjdyku xUoLAZ4pAnpn Test 8
26347 DNAKERR vlehbnampb NBv08yJdKb75ybhaFqED Test 91
6084965 LJPMZMEU ccigzyfwnf MPxQ2t8jjmv0IT45yVcR Test 4
6619851 FEHKGHTUW wswuefehsp 40n7Ttg7H5YtVPF Test 848
781 LRWKVDUV bywoxqizju UxIp2O4Jb82Ts Test 6268
52237 XXNPBL beqxrgstdo Uf9j7tCB4W2 Test 43
876150 ZDRABW fvvinypvqa uo8zfRx07s6d0EP Test 7
Note that this is a fast process - generating 1000 rows with 5 columns took less than half a second on average in tests I've made.请注意,这是一个快速的过程 - 在我所做的测试中,生成 1000 行 5 列的平均时间不到半秒。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.