简体   繁体   English

如何比较两个字符串是否在SQL Server 2008的T-SQL中包含相同的单词?

[英]How to compare if two strings contain the same words in T-SQL for SQL Server 2008?

When I compare two strings in SQL Server, there are couple of simple ways with = or LIKE . 当我在SQL Server中比较两个字符串时,有几种简单的方法与=LIKE

I want to redefine equality as: 我想重新定义平等:

If two strings contain the same words - no matter in what order - they are equal, otherwise they are not. 如果两个字符串包含相同的单词 - 无论以什么顺序 - 它们是相等的,否则它们不是。

For example: 例如:

  • 'my word' and 'word my' are equal 'my word''word my'是平等的
  • 'my word' and 'aaamy word' are not 'my word''aaamy word'不是

What's the best simple solution for this problem? 这个问题最简单的解决方案是什么?

I don't think there is a simple solution for what you are trying to do in SQL Server. 我不认为有一个简单的解决方案,你在SQL Server中尝试做什么。 My first thought would be to create a CLR UDF that: 我的第一个想法是创建一个CLR UDF:

  1. Accepts two strings 接受两个字符串
  2. Breaks them into two arrays using the split function on " " 使用“”上的拆分​​函数将它们分成两个数组
  3. Compare the contents of the two arrays, returning true if they contain the same elements. 比较两个数组的内容,如果它们包含相同的元素,则返回true。

If this is a route you'd like to go, take a look at this article to get started on creating CLR UDFs. 如果这是您想要的路线,请查看本文以开始创建CLR UDF。

Try this... The StringSorter function breaks strings on a space and then sorts all the words and puts the string back together in sorted word order. 试试这个... StringSorter函数打破空格上的字符串,然后对所有单词进行排序,并按字母排序顺序将字符串放回原处。

CREATE FUNCTION dbo.StringSorter(@sep char(1), @s varchar(8000))
RETURNS varchar(8000)
AS
BEGIN
    DECLARE @ResultVar varchar(8000);

    WITH sorter_cte AS (
      SELECT CHARINDEX(@sep, @s) as pos, 0 as lastPos
      UNION ALL
      SELECT CHARINDEX(@sep, @s, pos + 1), pos
      FROM sorter_cte
      WHERE pos > 0
    )
    , step2_cte AS (
    SELECT SUBSTRING(@s, lastPos + 1,
             case when pos = 0 then 80000
             else pos - lastPos -1 end) as chunk
    FROM sorter_cte
    )
    SELECT @ResultVar = (select ' ' + chunk 
                                     from step2_cte 
                                     order by chunk 
                                     FOR XML PATH(''));
    RETURN @ResultVar;
END
GO

Here is a test case just trying out the function: 这是一个只是尝试该功能的测试用例:

SELECT dbo.StringSorter(' ', 'the quick brown dog jumped over the lazy fox');

which produced these results: 产生了这些结果:

  brown dog fox jumped lazy over quick the the

Then to run it from a select statement using your strings 然后使用您的字符串从select语句运行它

SELECT case when dbo.StringSorter(' ', 'my word') = 
                     dbo.StringSorter(' ', 'word my') 
               then 'Equal' else 'Not Equal' end as ResultCheck
SELECT case when dbo.StringSorter(' ', 'my word') = 
                     dbo.StringSorter(' ', 'aaamy word') 
               then 'Equal' else 'Not Equal' end as ResultCheck

The first one shows that they are equal, and the second does not. 第一个表明它们是平等的,第二个表示它们不相同。

This should do exactly what you are looking for with a simple function utilizing a recursive CTE to sort your string. 使用递归CTE对字符串进行排序的简单函数可以完全满足您的要求。

Enjoy! 请享用!

A VERY simple way to do this... JC65100 一个非常简单的方法... JC65100

ALTER FUNCTION [dbo].[ITS_GetDifCharCount] 
(
@str1 VARCHAR(MAX)
,@str2 VARCHAR(MAX)
)
RETURNS INT
AS
BEGIN
DECLARE @result INT

SELECT @result = COUNT(*)
FROM dbo.ITS_CompareStrs(@str1,@str2 )

RETURN @result

END


ALTER FUNCTION [dbo].[ITS_CompareStrs]
(
@str1 VARCHAR(MAX)
,@str2 VARCHAR(MAX)
)
RETURNS 
@Result TABLE  (ind INT, c1 char(1), c2 char(1))
AS
BEGIN
    DECLARE @i AS INT
             ,@c1 CHAR(1)
             ,@c2 CHAR(1)

    SET @i = 1

    WHILE LEN (@str1) > @i-1  OR LEN (@str2) > @i-1   
    BEGIN

      IF LEN (@str1) > @i-1
        SET @c1 = substring(@str1, @i, 1)  

      IF LEN (@str2) > @i-1
        SET @c2 = substring(@str2, @i, 1)

      INSERT INTO @Result([ind],c1,c2)
      SELECT @i,@c1,@c2

      SELECT @i=@i+1
              ,@c1=NULL
              ,@c2=NULL

    END

    DELETE FROM @Result
    WHERE c1=c2


RETURN 
END

There is no simple way to do this. 没有简单的方法可以做到这一点。 You are advised to write a function or stored procedure that does he processing involved with this requirement. 建议您编写一个函数或存储过程,该过程涉及此要求。

Your function can use other functions that split the stings into parts, sort by words etc. 您的函数可以使用其他函数将stings分成几部分,按字等排序。

Here's how you can split the strings: 以下是分割字符串的方法:

T-SQL: Opposite to string concatenation - how to split string into multiple records T-SQL:与字符串连接相反 - 如何将字符串拆分为多个记录

Scenario is as follows. 情景如下。 You would want to use a TVF to split the first and the second strings on space and then full join the resulting two tables on values and if you have nulls on left or right you've got inequality otherwise they are equal. 您可能希望使用TVF在空间上分割第一个和第二个字符串,然后在值上full join结果两个表,如果左侧或右侧有空值,则不等式,否则它们是相等的。

There is library called http://www.sqlsharp.com/ that contains a whole range of useful string/math functions. 有一个名为http://www.sqlsharp.com/的库,它包含一系列有用的字符串/数学函数。

It has a function called String_CompareSplitValues which does precisely what you want. 它有一个名为String_CompareSplitValues的函数,它可以精确地执行您想要的操作。

I am not sure if it is in the community version or the paid for version. 我不确定它是社区版还是付费版。

declare @s1 varchar(50) = 'my word'
declare @s2 varchar(50) = 'word my'

declare @t1 table (word varchar(50))

while len(@s1)>0 
begin
    if (CHARINDEX(' ', @s1)>0)
    begin       
        insert into @t1 values(ltrim(rtrim(LEFT(@s1, charindex(' ', @s1)))))        
        set @s1 = LTRIM(rtrim(right(@s1, len(@s1)-charindex(' ', @s1))))
    end
    else
    begin
        insert into @t1 values (@s1)
        set @s1=''      
    end     
end

declare @t2 table (word varchar(50))
while len(@s2)>0 
begin
    if (CHARINDEX(' ', @s2)>0)
    begin       
        insert into @t2 values(ltrim(rtrim(LEFT(@s2, charindex(' ', @s2)))))        
        set @s2 = LTRIM(rtrim(right(@s2, len(@s2)-charindex(' ', @s2))))
    end
    else
    begin
        insert into @t2 values (@s2)
        set @s2=''      
    end     
end

select case when exists(SELECT * FROM @t1 EXCEPT SELECT * FROM @t2) then 'are not' else 'are equal' end

You can add a precomputed column in the base table that is evaluated in INSERT/UPDATE trigger (or UDF default) that splits, sorts and then concatenates words from the original column. 您可以在基表中添加一个预先计算的列,该列在INSERT / UPDATE触发器(或UDF默认值)中进行评估,该列拆分,排序,然后连接原始列中的单词。

Then use = to compare these precomputed columns. 然后使用=来比较这些预先计算的列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM