In SQL, how do you check for similar row values to another

Question

I'm using SQL Server (T-SQL). I have a column with the row values A, AB, ABC, AC. I want to remove any values that are contained with another row. In this case I'd be left with ABC and AC since A and AB and contained in the other two.

My thought is to take each value of the column and use LIKE to search through the whole column and count the number of results returned If it is equal to 1 then it is not contained in other rows.

Is that a good way to do it? I ask because I'm reluctant to use loops/cursors.

Thanks

Here is a code sample taken from the explanation above:

CREATE TABLE #t (words varchar(10))
INSERT INTO #t 
VALUES ('A'),('AB'),('ABC'),('AC')

Using cursors, I think I'd do something like:

DECLARE @branches TABLE (words varchar(10), n int)
DECLARE @word VARCHAR(10)
DECLARE cursor_word CURSOR

FOR SELECT words FROM #t

OPEN cursor_word;
FETCH NEXT FROM cursor_word INTO @word
WHILE @@FETCH_STATUS = 0
    BEGIN
        INSERT INTO @branches SELECT @word, COUNT(*) FROM #t WHERE words like CONCAT('%', @word ,'%')
        FETCH NEXT FROM cursor_word INTO @word
    END

CLOSE cursor_word
DEALLOCATE cursor_word

SELECT * FROM @branches WHERE n = 1

Answer 1

You could try something like

SELECT *
FROM (
    SELECT *
    , Row_Number() OVER(ORDER BY Words) N -- Create identifier for the row
    FROM #t
) t1
LEFT JOIN (
    SELECT *
    , Row_Number() OVER(ORDER BY Words) N -- Create identifier for the row
    FROM #t 
) t2 on t1.N <> t2.n -- Where the identifier is different 
    AND t2.Words LIKE t1.Words + '%' -- Where t2.Words starts with t1.Words
WHERE t2.Words IS NULL -- And there is no match of t2.

Answer 2

I would juste use not exists for this. This requires having a primary key in your table (which is a must-have anyway), so let me assume id :

select t.*
from mytable t
where not exists (
    select 1 
    from mytable t1 
    where t1.id <> t.id and t1.word like '%' + t.word + '%'
)

Answer 3

I would use not exists , but no primary key is necessary:

select t.*
from t
where not exists (select 1
                  from t t2
                  where t2.words like '%' + t.words + '%' and
                        t2.words <> t.words
                 );

Here is a db<>fiddle.

The method that you describe is:

select t.*
from t
where (select count(*)
       from t t2
       where t2.words like '%' + t.words + '%' 
      ) = 1;

If you have no duplicates, this is functionally equivalent to the not exists version. However, not exists is much better. Why? The aggregation version has to go through every row to calculate the count. The not exists version can stop at the first match -- which can significantly reduce the number of like comparisons.

In SQL, how do you check for similar row values to another

Question

3 answers

solution1
1 2020-07-26 08:03:35

solution2
1 2020-07-26 09:50:52

solution3
1 ACCPTED 2020-07-26 12:07:09

In SQL, how do you check for similar row values to another

Question

3 answers

solution1 1 2020-07-26 08:03:35

solution2 1 2020-07-26 09:50:52

solution3 1 ACCPTED 2020-07-26 12:07:09

solution1
1 2020-07-26 08:03:35

solution2
1 2020-07-26 09:50:52

solution3
1 ACCPTED 2020-07-26 12:07:09