简体   繁体   中英

How to count the number of occurrence in 4 columns

I have a T-SQL query to do in the most efficient way as possible.

Here is an example of my table:

A B C D E
1 x, y z NULL NULL
2 x NULL NULL y
3 y z NULL NULL
4 a NULL b x

Now, I need to do a query to classify my best matching records. Let's say that I need to take the top 3 records that match the more of the values 'x' & 'y' (it could be more than 2 values) into the columns B, C, D, E

A NumberOfMatches Comment
1 2 Because Column B contains x, y
2 2 Because Column B contains x & Column E contains y
3 1 Because Column B contains y
4 1 Because Column E contains x

Could you help me to find a good way to do this query?

I would highly recommend that you change how your data is stored, storing delimited strings to record multiple values is a recipe for disaster. If you have a one to many relationship use a child table with a foreign key to the main table. There is very rarely a reason to store delimited data like this, and when you have to query it and manipulate it you realise why it is not advised.

Assuming that each of your columns are the same and can hold many values, you'll need to split all of them, which you can do using this:

SELECT  t.A, upvt.Col, Value = TRIM(ss.value)
FROM    #T AS t
        CROSS APPLY (VALUES ('B', t.B), ('C', t.C), ('D', t.D), ('E', t.E)) upvt (Col, Value)
        CROSS APPLY STRING_SPLIT(upvt.Value, ',') AS ss;

Which gives:

A Col Value
1 B x
1 B y
1 C z
2 B x
2 E y
3 B y
3 C z
4 B a
4 D b
4 E x

With this normalised data, you can then just do a simple WHERE Value IN ('x', 'y') along with GROUP BY and COUNT(*) :

IF OBJECT_ID(N'tempdb..#T ', 'U') IS NOT NULL
    DROP TABLE #T ;

CREATE TABLE #T (A INT, B VARCHAR(4), C VARCHAR(4), D VARCHAR(4), E VARCHAR(4));

INSERT #T(A, B, C, D, E)
VALUES
    (1, 'x, y', 'z', NULL, NULL),
    (2, 'x', NULL, NULL, 'y'),
    (3, 'y', 'z', NULL, NULL),
    (4, 'a', NULL, 'b', 'x');

SELECT  t.A, NumberOfMatches = COUNT(*)
FROM    #T AS t
        CROSS APPLY (VALUES (t.B), (t.C), (t.D), (t.E)) upvt (Value)
        CROSS APPLY STRING_SPLIT(upvt.Value, ',') AS ss
WHERE   TRIM(ss.value) IN ('x', 'y')
GROUP BY t.A
ORDER BY COUNT(*) DESC, t.A;

If you can't normalize the model you can use this query to get your expected result:

select a, count(*) numberOfMatches,
concat('Because column ', string_agg(columnName, ', ')) AS comment
from (
    select a, columnName, trim(value) targetValue from (
        select a, columnName, value result
        from tbl unpivot (value for columnName in ([B], [C], [D], [E])) up
    ) t 
    outer apply string_split(result,',') 
) r 
where targetValue in ('x','y')
group by a

-- Result
/*
a   numberOfMatches comment
1   2               Because column B, B
2   2               Because column B, E
3   1               Because column B
4   1               Because column E
*/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM