简体   繁体   中英

Comparing values separated by a special characters in SQL Server

I have 2 columns, let's say COLA and COLB , and data as follows:

COLA              | COLB
------------------+------------------
PLATE|SPOON|GLASS | PLATE|GLASS|SPOON
PLATE             | SPOON
OIL|JUG|MAT       | JUG|MAT
SPOON             | SPOON
OIL|MAT           | MAT|OIL

I'm trying to return non-matching rows irrespective of order.

Expected output:

COLA        | COLB
------------+--------
PLATE       | SPOON
OIL|JUG|MAT | JUG|MAT

I have tried something like below and so many things but not working. I don't have much knowledge on SQL part:

SELECT *
FROM MYTABLE
WHERE COLA NOT LIKE '%COLB%'

One method is a recursive subquery:

with cte as (
      select convert(varchar(max), null) as parta,
             convert(varchar(max), cola) as resta,
             cola, colb,
             row_number() over (order by (select null)) as seqnum
      from t
      union all
      select convert(varchar(max),
                     left(resta, charindex('|', resta + '|') - 1)
                    ) as parta,
             convert(varchar(max),
                     stuff(resta, 1, charindex('|', resta + '|'), '')
                    ) as resta,
             cola, colb, seqnum
      from cte
      where resta <> ''
     )
select cola, colb
from cte
where parta is not null
group by seqnum, cola, colb
having sum(case when concat('|', colb, '|') like concat('%|', parta, '|%') then 1 else 0 end) <> count(*) or
       len(cola) <> len(colb);

Here is a db<>fiddle.

This is much simpler in more recent versions of SQL Server that support string splitting and aggregation.

You can use a user defined function to split the delimited strings in each column and then compare the results of that function.

I've chosen to use one of the fastest string-split functions for SQL Server (prior to 2016 where it has a built in string split) - Jeff Moden's DelimitedSplit8K. You can read all about it in his article called Tally OH! An Improved SQL 8K “CSV Splitter” Function .

First, create and populate sample table ( Please save us this step in your future questions):

DECLARE @T AS TABLE (
    ColA varchar(100),
    ColB varchar(100)
);

INSERT INTO @T (ColA, ColB) VALUES
('PLATE|SPOON|GLASS', 'PLATE|GLASS|SPOON'),
('PLATE', 'SPOON'),
('OIL|JUG|MAT', 'JUG|MAT'),
('SPOON', 'SPOON'),
('OIL|MAT', 'MAT|OIL');

The query:

SELECT ColA, ColB
FROM @T
WHERE EXISTS (
    SELECT Item FROM [dbo].[DelimitedSplit8K](ColA, '|')
    EXCEPT 
    SELECT Item FROM [dbo].[DelimitedSplit8K](ColB, '|')
)
OR 
 EXISTS (
    SELECT Item FROM [dbo].[DelimitedSplit8K](ColB, '|')
    EXCEPT 
    SELECT Item FROM [dbo].[DelimitedSplit8K](ColA, '|')
)

Results:

ColA            ColB
PLATE           SPOON
OIL|JUG|MAT     JUG|MAT

You can see a live demo on rextester.

Here is a method which relies on splitting comma sep strings into rows using xml functions. Then comparing the values in the cola and colb and returing the differences

with data2
  as (select row_number() over(order by (select null)) as rnk ,cola,colb
        from t 
      )
  ,combo_data
  as(
   SELECT a.rnk 
          ,a.cola
          ,a.colb
          ,Split.a.value('.', 'NVARCHAR(max)') AS Data
          ,1 as a_flag
          ,null as b_flag
    FROM ( SELECT rnk
                  ,cola
                  ,colb
                  ,CAST ('<M>' + REPLACE(cola, '|', '</M><M>') + '</M>' AS XML) AS Data
            FROM data2    
          ) AS A 
    CROSS APPLY Data.nodes ('/M') AS Split(a)
   union all
   SELECT a.rnk 
          ,a.cola
          ,a.colb
          ,Split.a.value('.', 'NVARCHAR(max)') AS Data 
          ,null as a_flag
          ,1 as b_flag
    FROM ( SELECT rnk
                  ,cola
                  ,colb
                  ,CAST ('<M>' + REPLACE(colb, '|', '</M><M>') + '</M>' AS XML) AS Data
            FROM data2    
          ) AS A 
    CROSS APPLY Data.nodes ('/M') AS Split(a)
    )
    select rnk,cola,colb,data,count(a_flag) as present_in_cola,count(b_flag) as present_in_colb
      from combo_data
      group by rnk,cola,colb,data
      having count(a_flag) <> count(b_flag)
      order by 1,2,3,4


+-----+-------------+---------+-------+-----------------+-----------------+
| rnk |    cola     |  colb   | data  | present_in_cola | present_in_colb |
+-----+-------------+---------+-------+-----------------+-----------------+
|   2 | PLATE       | SPOON   | PLATE |               1 |               0 |
|   2 | PLATE       | SPOON   | SPOON |               0 |               1 |
|   3 | OIL|JUG|MAT | JUG|MAT | OIL   |               1 |               0 |
+-----+-------------+---------+-------+-----------------+-----------------+

db fiddle link https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=4c677e8628d2734305f2b7f1923e583b

By using a CROSS APPLY to cast the 2 strings as XML type.

And then the node values of those XML's can be compared in an EXISTS clause.

Sample Data:

 CREATE TABLE YourTable ( ID INT IDENTITY(1,1) PRIMARY KEY, ColA NVARCHAR(100), ColB NVARCHAR(100) ); INSERT INTO YourTable (ColA, ColB) VALUES ('PLATE|SPOON|GLASS', 'PLATE|GLASS|SPOON') , ('PLATE', 'SPOON') , ('OIL|JUG|MAT', 'JUG|MAT') , ('SPOON', 'SPOON') , ('OIL|MAT', 'MAT|OIL'); GO

Query:

 SELECT t.* FROM YourTable t CROSS APPLY ( SELECT CAST('<a>'+REPLACE(t.ColA,'|','</a><a>')+'</a>' AS XML) AS XmlA, CAST('<b>'+REPLACE(t.ColB,'|','</b><b>')+'</b>' AS XML) AS XmlB ) caX WHERE EXISTS ( SELECT 1 FROM ( ( SELECT a.val.value('.','nvarchar(100)') AS val FROM caX.XmlA.nodes('/a') AS a(val) EXCEPT SELECT b.val.value('.','nvarchar(100)') AS val FROM caX.XmlB.nodes('/b') AS b(val) ) UNION ALL ( SELECT b.val.value('.','nvarchar(100)') AS val FROM caX.XmlB.nodes('/b') AS b(val) EXCEPT SELECT a.val.value('.','nvarchar(100)') AS val FROM caX.XmlA.nodes('/a') AS a(val) ) ) q );

Result:

\nID | ColA | ColB   \n-: |  :---------- |  :------ \n 2 |  PLATE | SPOON  \n 3 |  OIL|JUG|MAT | JUG|MAT\n

Test on db<>fiddle here

Extra:

In Sql Server 2016 and beyond, the STRING_SPLIT function is available.
So then this shorter alternative would work:

SELECT t.*
FROM YourTable t
WHERE EXISTS
(
  SELECT 1
  FROM STRING_SPLIT(ColA,'|') a
  FULL JOIN STRING_SPLIT(ColB,'|') b
    ON a.value = b.value
  WHERE a.value IS NULL
     OR b.value IS NULL
);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM