简体   繁体   中英

How to compare if two strings contain the same words in T-SQL for SQL Server 2008?

When I compare two strings in SQL Server, there are couple of simple ways with = or LIKE .

I want to redefine equality as:

If two strings contain the same words - no matter in what order - they are equal, otherwise they are not.

For example:

  • 'my word' and 'word my' are equal
  • 'my word' and 'aaamy word' are not

What's the best simple solution for this problem?

I don't think there is a simple solution for what you are trying to do in SQL Server. My first thought would be to create a CLR UDF that:

  1. Accepts two strings
  2. Breaks them into two arrays using the split function on " "
  3. Compare the contents of the two arrays, returning true if they contain the same elements.

If this is a route you'd like to go, take a look at this article to get started on creating CLR UDFs.

Try this... The StringSorter function breaks strings on a space and then sorts all the words and puts the string back together in sorted word order.

CREATE FUNCTION dbo.StringSorter(@sep char(1), @s varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
    DECLARE @ResultVar varchar(8000);

    WITH sorter_cte AS (
      SELECT CHARINDEX(@sep, @s) as pos, 0 as lastPos
      UNION ALL
      SELECT CHARINDEX(@sep, @s, pos + 1), pos
      FROM sorter_cte
      WHERE pos > 0
    )
    , step2_cte AS (
    SELECT SUBSTRING(@s, lastPos + 1,
             case when pos = 0 then 80000
             else pos - lastPos -1 end) as chunk
    FROM sorter_cte
    )
    SELECT @ResultVar = (select ' ' + chunk 
                                     from step2_cte 
                                     order by chunk 
                                     FOR XML PATH(''));
    RETURN @ResultVar;
END
GO

Here is a test case just trying out the function:

SELECT dbo.StringSorter(' ', 'the quick brown dog jumped over the lazy fox');

which produced these results:

  brown dog fox jumped lazy over quick the the

Then to run it from a select statement using your strings

SELECT case when dbo.StringSorter(' ', 'my word') = 
                     dbo.StringSorter(' ', 'word my') 
               then 'Equal' else 'Not Equal' end as ResultCheck
SELECT case when dbo.StringSorter(' ', 'my word') = 
                     dbo.StringSorter(' ', 'aaamy word') 
               then 'Equal' else 'Not Equal' end as ResultCheck

The first one shows that they are equal, and the second does not.

This should do exactly what you are looking for with a simple function utilizing a recursive CTE to sort your string.

Enjoy!

A VERY simple way to do this... JC65100

ALTER FUNCTION [dbo].[ITS_GetDifCharCount] 
(
@str1 VARCHAR(MAX)
,@str2 VARCHAR(MAX)
)
RETURNS INT
AS
BEGIN
DECLARE @result INT

SELECT @result = COUNT(*)
FROM dbo.ITS_CompareStrs(@str1,@str2 )

RETURN @result

END


ALTER FUNCTION [dbo].[ITS_CompareStrs]
(
@str1 VARCHAR(MAX)
,@str2 VARCHAR(MAX)
)
RETURNS 
@Result TABLE  (ind INT, c1 char(1), c2 char(1))
AS
BEGIN
    DECLARE @i AS INT
             ,@c1 CHAR(1)
             ,@c2 CHAR(1)

    SET @i = 1

    WHILE LEN (@str1) > @i-1  OR LEN (@str2) > @i-1   
    BEGIN

      IF LEN (@str1) > @i-1
        SET @c1 = substring(@str1, @i, 1)  

      IF LEN (@str2) > @i-1
        SET @c2 = substring(@str2, @i, 1)

      INSERT INTO @Result([ind],c1,c2)
      SELECT @i,@c1,@c2

      SELECT @i=@i+1
              ,@c1=NULL
              ,@c2=NULL

    END

    DELETE FROM @Result
    WHERE c1=c2


RETURN 
END

There is no simple way to do this. You are advised to write a function or stored procedure that does he processing involved with this requirement.

Your function can use other functions that split the stings into parts, sort by words etc.

Here's how you can split the strings:

T-SQL: Opposite to string concatenation - how to split string into multiple records

Scenario is as follows. You would want to use a TVF to split the first and the second strings on space and then full join the resulting two tables on values and if you have nulls on left or right you've got inequality otherwise they are equal.

There is library called http://www.sqlsharp.com/ that contains a whole range of useful string/math functions.

It has a function called String_CompareSplitValues which does precisely what you want.

I am not sure if it is in the community version or the paid for version.

declare @s1 varchar(50) = 'my word'
declare @s2 varchar(50) = 'word my'

declare @t1 table (word varchar(50))

while len(@s1)>0 
begin
    if (CHARINDEX(' ', @s1)>0)
    begin       
        insert into @t1 values(ltrim(rtrim(LEFT(@s1, charindex(' ', @s1)))))        
        set @s1 = LTRIM(rtrim(right(@s1, len(@s1)-charindex(' ', @s1))))
    end
    else
    begin
        insert into @t1 values (@s1)
        set @s1=''      
    end     
end

declare @t2 table (word varchar(50))
while len(@s2)>0 
begin
    if (CHARINDEX(' ', @s2)>0)
    begin       
        insert into @t2 values(ltrim(rtrim(LEFT(@s2, charindex(' ', @s2)))))        
        set @s2 = LTRIM(rtrim(right(@s2, len(@s2)-charindex(' ', @s2))))
    end
    else
    begin
        insert into @t2 values (@s2)
        set @s2=''      
    end     
end

select case when exists(SELECT * FROM @t1 EXCEPT SELECT * FROM @t2) then 'are not' else 'are equal' end

You can add a precomputed column in the base table that is evaluated in INSERT/UPDATE trigger (or UDF default) that splits, sorts and then concatenates words from the original column.

Then use = to compare these precomputed columns.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM