简体   繁体   English

T-SQL查询以识别由单个重复字符/数字组成的varchar字段?

[英]T-SQL Query to identify varchar fields that consist of a single repeating char/digit?

I need to clean phone numbers stored as varchar. 我需要清理存储为varchar的电话号码。 There is bad data where unknown phone numbers are stored as a sequence of a single digit. 存在不良数据,其中未知电话号码被存储为单个数字的序列。 Eventually more complex (area code & prefix matching) will be done but I want a simply query to obviously bad records. 最终会完成更复杂的(区域代码和前缀匹配)但我想简单查询明显不好的记录。

So for example: 例如:

Valid Phone Number: 3289903829 有效电话号码:3289903829

Invalid Phone Number: 1111111111 电话号码无效:1111111111

Now if the bogus Product ID are the proper length ( 10 digits) it is easy to parse and clean. 现在,如果伪造的产品ID是合适的长度(10位数),则很容易解析和清理。

    SELECT phone
    FROM customers 
    SET phone = NULL 
    WHERE phone IN ('0000000000','9999999999',''8888888888','7777777777','6666666666','5555555555','4444444444','3333333333','2222222222','1111111111') 

However sometimes the bogus phones are of arbitrary length (likely due to typos) so 11 ones or 9 ones, or n ones. 然而,有时假电话是任意长度的(可能是由于拼写错误)所以11个或9个,或n个。
How can I ID strings that consists of all of the same char/digit? 如何识别包含所有相同字符/数字的字符串?

1111111 - match
4444 - match
1112 - no match
4445555 - no match 

You can get the first character and replicate it: 您可以获取第一个字符并复制它:

where phone = replicate(left(phone,1), len(phone))
    and phone is not null

Depending on how fast you need it to run, your other option is to populate a temp table and then join your phone number on it. 根据您运行所需的速度,您的另一个选择是填充临时表,然后在其上加入您的电话号码。 If you are doing it multiple times, you could even create a real table so you don't have to re-create it each run. 如果你多次这样做,你甚至可以创建一个真正的表,这样你就不必每次运行都重新创建它。 To make it faster you could also index the field. 为了加快速度,您还可以索引该字段。 Your may mileage may vary on fast you need to it to be compared to the number of records you have to compare. 您的里程数可能会因您需要的速度而有所不同,以便与您需要比较的记录数量进行比较。

CREATE TABLE #Numbers
(
    PhoneNumber VARCHAR(13) NOT NULL
)

DECLARE @run BIT
SET @run = 1

DECLARE @number INT
SET @number = 1

DECLARE @Counter INT 
SET @Counter = 1

WHILE(@run = 1)
BEGIN 

WHILE(@Counter < 13)
BEGIN 
    INSERT INTO #Numbers
    SELECT REPLICATE(@number,@counter)

    SET @Counter = @Counter + 1
END


SET @Counter = 1
SET @number = @number + 1

IF(@number > 9)
BEGIN 
    SET @run = 0
END

END

SELECT * FROM Phone p JOIN #numbers n ON p.PhoneNumber = n.PhoneNumber

This way you don't have to recalculate the field you are comparing the number to each time. 这样您就不必重新计算每次比较数字的字段。

Maybe you could create a SQL function to do this. 也许您可以创建一个SQL函数来执行此操作。

I think the guts of it would look something like this: 我认为它的内容看起来像这样:

DECLARE @field varchar(10) 
SET @field = '11111'

DECLARE @len INT
SET @len = LEN(@field)

DECLARE @counter INT
SET @counter = 1

DECLARE @firstChar VARCHAR(1)
SET @firstChar = NULL

DECLARE @currentChar VARCHAR(1)
SET @currentChar = NULL

DECLARE @allSameNumber BIT
SET @allSameNumber = 1

WHILE @counter <= @len AND @allSameNumber = 1 BEGIN

    SET @currentChar = SUBSTRING(@field,@counter,1) 
    IF @firstChar IS NULL BEGIN
        SET @firstChar = @currentChar
    END 
    IF NOT ISNUMERIC(@currentChar) = 1 OR NOT @currentChar = @firstChar BEGIN
        SET @allSameNumber = 0
    END
    SET @counter = @counter + 1

END

SELECT @allSameNumber

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM