简体   繁体   English

使用PATINDEX在T-SQL中查找不同长度的模式

[英]Using PATINDEX to find varying length patterns in T-SQL

I'm looking to pull floats out of some varchars, using PATINDEX() to spot them. 我想从一些varchars中拉出浮点数,使用PATINDEX()来发现它们。 I know in each varchar string, I'm only interested in the first float that exists, but they might have different lengths. 我知道在每个varchar字符串中,我只对存在的第一个浮点感兴趣,但它们可能有不同的长度。

eg 例如

'some text 456.09 other text'
'even more text 98273.453 la la la'

I would normally match these with a regex 我通常会将这些与正则表达式匹配

  "[0-9]+[.][0-9]+"

However, I can't find an equivalent for the + operator, which PATINDEX accepts. 但是,我找不到PATINDEX接受的+运算符的等价物。 So they would need to be matched (respectively) with: 所以他们需要(分别)匹配:

'[0-9][0-9][0-9].[0-9][0-9]' and '[0-9][0-9][0-9][0-9][0-9].[0-9][0-9][0-9]' 

Is there any way to match both of these example varchars with one single valid PATINDEX pattern? 有没有办法将这两个示例varchars与一个有效的PATINDEX模式匹配?

I blogged about this a while ago. 我刚才在博客上写了这篇文章。 Extracting numbers with SQL server 使用SQL Server提取数字

Declare @Temp Table(Data VarChar(100))

Insert Into @Temp Values('some text 456.09 other text')
Insert Into @Temp Values('even more text 98273.453 la la la')
Insert Into @Temp Values('There are no numbers in this one')

Select Left(
             SubString(Data, PatIndex('%[0-9.-]%', Data), 8000),
             PatIndex('%[^0-9.-]%', SubString(Data, PatIndex('%[0-9.-]%', Data), 8000) + 'X')-1)
From   @Temp

Wildcards. 通配符。

SELECT PATINDEX('%[0-9]%[0-9].[0-9]%[0-9]%','some text 456.09 other text')
SELECT PATINDEX('%[0-9]%[0-9].[0-9]%[0-9]%','even more text 98273.453 la la la')

Yes you need to link to the clr to get regex support. 是的,您需要链接到clr以获得正则表达式支持。 But if PATINDEX does not do what you need then regex was designed exactly for that. 但是如果PATINDEX不能满足您的需求,那么regex就是为此而设计的。

http://msdn.microsoft.com/en-us/magazine/cc163473.aspx http://msdn.microsoft.com/en-us/magazine/cc163473.aspx

Should be checked for robustness (what if you only have an int, for example), but this is just to put you on a track: 应该检查健壮性(例如,如果你只有一个int,那么),但这只是为了让你走上正轨:

if exists (select routine_name from information_schema.routines where routine_name = 'GetFirstFloat')
    drop function GetFirstFloat
go

create function GetFirstFloat (@string varchar(max))
returns float
as
begin
    declare @float varchar(max)
    declare @pos int

    select @pos = patindex('%[0-9]%', @string)
    select @float = ''

    while isnumeric(substring(@string, @pos, 1)) = 1
    begin
        select @float = @float + substring(@string, @pos, 1)
        select @pos = @pos + 1
    end

    return cast(@float as float)
end
go


select dbo.GetFirstFloat('this is a string containing pi 3.14159216 and another non float 3 followed by a new fload 5.41 and that''s it')
select dbo.GetFirstFloat('this is a string with no float')
select dbo.GetFirstFloat('this is another string with an int 3')

PATINDEX is not powerful enough to do that. PATINDEX不够强大。 You should use regular expressions. 你应该使用正则表达式。

SQL Server has Regular expression support since SQL Server 2005. 自SQL Server 2005以来,SQL Server具有正则表达式支持。

Given that the pattern is going to be varied in length, you're not going to have a rough time getting this to work with PATINDEX. 考虑到模式的长度会有所不同,你不会花费大量时间与PATINDEX合作。 There is another post that I wrote , which I've modified to accomplish what you're trying to do here. 我写了另一篇文章 ,我已经修改过以完成你在这里尝试做的事情。 Will this work for you? 这对你有用吗?

CREATE TABLE #nums (n INT)
DECLARE @i INT 
SET @i = 1
WHILE @i < 8000 
BEGIN
    INSERT #nums VALUES(@i)
    SET @i = @i + 1
END

CREATE TABLE #tmp (
  id INT IDENTITY(1,1) not null,
  words VARCHAR(MAX) null
)

INSERT INTO #tmp
VALUES('I''m looking for a number, regardless of length, even 23.258 long'),('Maybe even pi which roughly 3.14159265358,'),('or possibly something else that isn''t a number')

UPDATE #tmp SET words = REPLACE(words, ',',' ')

;WITH CTE AS (SELECT ROW_NUMBER() OVER (ORDER BY ID) AS rownum, ID, NULLIF(SUBSTRING(' ' + words + ' ' , n , CHARINDEX(' ' , ' ' + words + ' ' , n) - n) , '') AS word
    FROM #nums, #tmp
    WHERE ID <= LEN(' ' + words + ' ') AND SUBSTRING(' ' + words + ' ' , n - 1, 1) = ' ' 
    AND CHARINDEX(' ' , ' ' + words + ' ' , n) - n > 0),
    ids AS (SELECT ID, MIN(rownum) AS rownum FROM CTE WHERE ISNUMERIC(word) = 1 GROUP BY id)
SELECT CTE.rownum, cte.id, cte.word
FROM CTE, ids WHERE cte.id = ids.id AND cte.rownum = ids.rownum

The explanation and origin of the code is covered in more detail in the origional post 原始帖子中更详细地介绍了代码的解释和起源

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM