简体   繁体   English

SQL SERVER 使用 Regex 模式生成数据

[英]SQL SERVER generate data using Regex pattern

I would like to generate the data by given regex pattern in SQL Server .我想通过SQL Server给定的正则表达式模式生成数据。 Is there any possibility to do?有没有可能做? Say, I have pattern as below and I would like to generate data as follow:说,我有如下模式,我想生成如下数据:

The idea behind the concept is SQL STATIC DATA MASKING (which was removed in current feature ).该概念背后的想法是SQL STATIC DATA MASKING (在当前功能中已删除)。 Our client wants to mask the production data in test database.我们的客户想要屏蔽测试数据库中的生产数据。 We don't have SQL STATIC DATA MASKING feature with sql now but we have patterns to mask the column, so what I am thinking is, with these pattern we can run the update query.我们现在没有 sql 的 SQL STATIC DATA MASKING 功能,但是我们有模式来屏蔽列,所以我在想的是,使用这些模式我们可以运行更新查询。

SELECT "(\d){7}" AS RandonNumber, "(\W){5}" AS RandomString FROM tbl

Output Should be输出应该是

  +---------------+--------------+
  |  RandonNumber | RandomString |
  +---------------+--------------+
  |  7894562      | AHJIL        |
  +---------------+--------------+
  |  9632587      | ZLOKP        |
  +---------------+--------------+
  |  4561238      | UJIOK        |
  +---------------+--------------+

Apart from this regular pattern, I have some customized pattern like Test_Product_(\\d){1,4} , which should give result as below:除了这种常规模式之外,我还有一些自定义模式,如Test_Product_(\\d){1,4} ,它应该给出如下结果:

Test_Product_012 
Test_Product_143
Test_Product_8936

Complete Patterns which I am going to use for masking我将用于遮罩的完整图案

Other Patterns                Samples
(\l){30}                      ahukoklijfahukokponmahukoahuko
(\d){7}                       7895623
(\W){5}                       ABCDEF
Test_Product_(\d){1,4}        Test_Product_007
0\.(\d){2}                    0.59
https://www\.(\l){10}\.com    https://www.anything.com

I'm not convinced you need a Regex for this.我不相信你需要一个正则表达式。 Why not just use a "scrub script" and take advantage of the newid() function to generate a bunch of random data.为什么不直接使用“ newid()脚本”并利用newid()函数来生成一堆随机数据。 It looks like you'll need to write such a script anyway, Regex or not, and this has the benefit of being very simple.看起来您无论如何都需要编写这样的脚本,无论是否使用正则表达式,而且这样做的好处是非常简单。

Let's say you start with the following data:假设您从以下数据开始:

create table tbl (PersonalId int, Name varchar(max))

insert into tbl select 300300, 'Michael'
insert into tbl select 554455, 'Tim'
insert into tbl select 228899, 'John'

select * from tbl

在此处输入图片说明

Then run your script:然后运行你的脚本:

update tbl set PersonalId = cast(rand(checksum(newid())) * 1000000 as int)
update tbl set Name = left(convert(varchar(255), newid()), 6)

select * from tbl

在此处输入图片说明

Well, I can give you a solution that is not based on regular expressions, but on a set of parameters - but it contains a complete set of all your requirements.好吧,我可以给你一个解决方案,它不是基于正则表达式,而是基于一组参数——但它包含了你所有要求的完整集合。
I've based this solution on a user-defined function I've written to generate random strings ( You can read my blog post about it here ) - I've just changed it so that it could generate the mask you wanted based on the following conditions:我将此解决方案基于我编写的用于生成随机字符串的用户定义函数( 您可以在此处阅读我的博客文章)-我刚刚对其进行了更改,以便它可以根据以下条件:

  • The mask has an optional prefix.掩码有一个可选的前缀。
  • The mask has an optional suffix.掩码有一个可选的后缀。
  • The mask has a variable-length random string.掩码有一个可变长度的随机字符串。
  • The random string can contain either lower-case letters, upper-case letters, digits, or any combination of the above.随机字符串可以包含小写字母、大写字母、数字或上述任意组合。

I've decided these set of rules based on your update to the question, containing your desired masks:我已经根据您对问题的更新决定了这些规则集,其中包含您想要的掩码:

 (\\d){7} 7895623 (\\W){5} ABCDEF Test_Product_(\\d){1,4} Test_Product_007 0\\.(\\d){2} 0.59 https://www\\.(\\l){10}\\.com https://www.anything.com

And now, for the code:现在,对于代码:
Since I'm using a user-defined function, I can't use inside it the NewId() built in function - so we first need to create a view to generate the guid for us:由于我使用的是用户定义的函数,我不能在它内部使用NewId()内置函数 - 所以我们首先需要创建一个视图来为我们生成 guid:

CREATE VIEW GuidGenerator
AS
    SELECT Newid() As NewGuid;

In the function, we're going to use that view to generate a NewID() as the base of all randomness.在函数中,我们将使用该视图生成一个NewID()作为所有随机性的基础。

The function itself is a lot more cumbersome then the random string generator I've started from:该函数本身比我开始使用的随机字符串生成器要麻烦得多:

CREATE FUNCTION dbo.MaskGenerator
(
    -- use null or an empty string for no prefix
    @Prefix nvarchar(4000), 
    -- use null or an empty string for no suffix
    @suffix nvarchar(4000), 
    -- the minimum length of the random part
    @MinLength int, 
    -- the maximum length of the random part
    @MaxLength int, 
    -- the maximum number of rows to return. Note: up to 1,000,000 rows
    @Count int, 
    -- 1, 2 and 4 stands for lower-case, upper-case and digits. 
    -- a bitwise combination of these values can be used to generate all possible combinations:
    -- 3: lower and upper, 5: lower and digis, 6: upper and digits, 7: lower, upper nad digits
    @CharType tinyint 
)
RETURNS TABLE
AS 
RETURN 
-- An inline tally table with 1,000,000 rows
WITH E1(N) AS (SELECT N FROM (VALUES (1), (2), (3), (4), (5), (6), (7), (8), (9), (10)) V(N)),   -- 10
     E2(N) AS (SELECT 1 FROM E1 a, E1 b), --100
     E3(N) AS (SELECT 1 FROM E2 a, E2 b), --10,000
     Tally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY @@SPID) FROM E3 a, E2 b) --1,000,000

SELECT TOP(@Count) 
        n As Number, 
        CONCAT(@Prefix, (
        SELECT  TOP (Length) 
                -- choose what char combination to use for the random part
                CASE @CharType 
                    WHEN 1 THEN Lower
                    WHEN 2 THEN Upper
                    WHEN 3 THEN IIF(Rnd % 2 = 0, Lower, Upper)
                    WHEN 4 THEN Digit
                    WHEN 5 THEN IIF(Rnd % 2 = 0, Lower, Digit)
                    WHEN 6 THEN IIF(Rnd % 2 = 0, Upper, Digit)
                    WHEN 7 THEN 
                        CASE Rnd % 3
                            WHEN 0 THEN Lower
                            WHEN 1 THEN Upper
                            ELSE Digit
                        END
                END
        FROM Tally As t0  
        -- create a random number from the guid using the GuidGenerator view
        CROSS APPLY (SELECT Abs(Checksum(NewGuid)) As Rnd FROM GuidGenerator) As rand
        CROSS APPLY
        (
            -- generate a random lower-case char, upper-case char and digit
            SELECT  CHAR(97 + Rnd % 26) As Lower, -- Random lower case letter
                    CHAR(65 + Rnd % 26) As Upper,-- Random upper case letter
                    CHAR(48 + Rnd % 10) As Digit -- Random digit
        ) As Chars
        WHERE  t0.n <> -t1.n -- Needed for the subquery to get re-evaluated for each row
        FOR XML PATH('') 
        ), @Suffix) As RandomString
FROM Tally As t1
CROSS APPLY
(
    -- Select a random length between @MinLength and @MaxLength (inclusive)
    SELECT TOP 1 n As Length
    FROM Tally As t2
    CROSS JOIN GuidGenerator 
    WHERE t2.n >= @MinLength
    AND t2.n <= @MaxLength
    AND t2.n <> t1.n
    ORDER BY NewGuid
) As Lengths;

And finally, Test cases:最后,测试用例:

(\\l){30} - ahukoklijfahukokponmahukoahuko

SELECT RandomString FROM dbo.MaskGenerator(null, null, 30, 30, 2, 1); 

Results:结果:

1, eyrutkzdugogyhxutcmcmplvzofser
2, juuyvtzsvmmcdkngnzipvsepviepsp

(\\d){7} - 7895623

SELECT RandomString FROM dbo.MaskGenerator(null, null, 7, 7, 2, 4); 

Results:结果:

1, 8744412
2, 2275313

(\\W){5} - ABCDE

SELECT RandomString FROM dbo.MaskGenerator(null, null, 5, 5, 2, 2); 

Results:结果:

1, RSYJE
2, MMFAA

Test_Product_(\\d){1,4} - Test_Product_007

SELECT RandomString FROM dbo.MaskGenerator('Test_Product_', null, 1, 4, 2, 4); 

Results:结果:

1, Test_Product_933
2, Test_Product_7

0\\.(\\d){2} - 0.59

SELECT RandomString FROM dbo.MaskGenerator('0.', null, 2, 2, 2, 4); 

Results:结果:

1, 0.68
2, 0.70

https://www\\.(\\l){10}\\.com - https://www.anything.com

SELECT RandomString FROM dbo.MaskGenerator('https://www.', '.com', 10, 10, 2, 1); 

Results:结果:

1, https://www.xayvkmkuci.com
2, https://www.asbfcvomax.com       

Here's how you use it to mask the content of a table:以下是您如何使用它来屏蔽表的内容:

DECLARE @Count int = 10; 

SELECT  CAST(IntVal.RandomString As Int) As IntColumn, 
        UpVal.RandomString as UpperCaseValue, 
        LowVal.RandomString as LowerCaseValue, 
        MixVal.RandomString as MixedValue,
        WithPrefix.RandomString As PrefixedValue
FROM dbo.MaskGenerator(null, null, 3, 7, @Count, 4) As IntVal
JOIN dbo.MaskGenerator(null, null, 10, 10, @Count, 1) As LowVal
    ON IntVal.Number = LowVal.Number
JOIN dbo.MaskGenerator(null, null, 5, 10, @Count, 2) As UpVal
    ON IntVal.Number = UpVal.Number
JOIN dbo.MaskGenerator(null, null, 10, 20, @Count, 7) As MixVal
    ON IntVal.Number = MixVal.Number
JOIN dbo.MaskGenerator('Test ', null, 1, 4, @Count, 4) As WithPrefix
    ON IntVal.Number = WithPrefix.Number

Results:结果:

IntColumn   UpperCaseValue  LowerCaseValue  MixedValue              PrefixedValue
674         CCNVSDI         esjyyesesv      O2FAC7bfwg2Be5a91Q0     Test 4935
30732       UJKSL           jktisddbnq      7o8B91Sg1qrIZSvG3AcL    Test 0
4669472     HDLJNBWPJ       qgtfkjdyku      xUoLAZ4pAnpn            Test 8
26347       DNAKERR         vlehbnampb      NBv08yJdKb75ybhaFqED    Test 91
6084965     LJPMZMEU        ccigzyfwnf      MPxQ2t8jjmv0IT45yVcR    Test 4
6619851     FEHKGHTUW       wswuefehsp      40n7Ttg7H5YtVPF         Test 848
781         LRWKVDUV        bywoxqizju      UxIp2O4Jb82Ts           Test 6268
52237       XXNPBL          beqxrgstdo      Uf9j7tCB4W2             Test 43
876150      ZDRABW          fvvinypvqa      uo8zfRx07s6d0EP         Test 7

Note that this is a fast process - generating 1000 rows with 5 columns took less than half a second on average in tests I've made.请注意,这是一个快速的过程 - 在我所做的测试中,生成 1000 行 5 列的平均时间不到半秒。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM