简体   繁体   中英

Generate an ID based on Column Values

I have the following dataset that represents a mapping between a project and activity:

ProjectID  ActivityID
A         A
B         B
B         C
D         D
E         D
M         N

I'd like to calculate an ID, based on the following rules:

a project that maps 1-1 with an activity:
A - A
M - N

one project that maps to multiple activities:
B - B
B - C

one activity that maps to multiple projects:
D - D
E - D

This would generate:

ProjectID  Activity CalculatedID
A         A        1
B         B        2
B         C        2
D         D        3
E         D        3
M         N        4

I hope there's enough info there, any ideas appreciated. I'm particularly interested in seeign a set-based approach.

-- UPDATE: Note on the answers -- I'd describe thE approach taken by @Erwin as a classification of the mappings, in contrast to the solution provided by @mellamokb (that builds on @CodeByMoonlight's solution) that assigns a seqential ID. Both of your solutions have helped me on my way, thanks guys!

It's a little convoluted but it works:

SELECT ProjectID, ActivityID,
DENSE_RANK() OVER(ORDER BY ProjectID) +
DENSE_RANK() OVER(ORDER BY ActivityID) -
ROW_NUMBER() OVER(ORDER BY ProjectID, ActivityID) AS CalculatedID
FROM MyTable

The two uses of DENSE_RANK have the effect of creating an offset against ROW_NUMBER when a repeat of ProjectID or ActivityID occurs.

Here's a solution building on @CodeByMoonlight 's answer that handles the case where the activityID and projectID can be interleaved, ie, the activityID is higher ID than some other entries, but the projectID is lower ID than some other entries:

SELECT
    D.ProjectID,
    D.ActivityID,
    -- generate id based on the three different scenarios
    -- 1) projects with 1-many activities, use project id
    -- 2) activities with 1-many projects, use activity id
    -- 3) 1-1, use project id
    DENSE_RANK() over (order by
        case
            when P.ProjectID is not null then P.ProjectID
            when A.ActivityID is not null then A.ActivityID
            else D.ProjectID
        end
    ) as Identifier
from
    MyTable D
left join
(
    -- projects with 1-many activities
    SELECT ProjectID
    FROM MyTable
    group by ProjectID
    having Count(ActivityID) > 1
) P on P.ProjectID = D.ProjectID
left join
(
    -- activities with 1-many projects
    SELECT ActivityID
    FROM MyTable
    group by ActivityID
    having Count(ProjectID) > 1
) A on A.ActivityID = D.ActivityID

Sample Input:

B   C
A   A
B   B
B   G
D   D
B   F
E   D
M   N

Sample Output:

A   A   1
B   B   2
B   G   2
B   F   2
B   C   2
E   D   3
D   D   3
M   N   4
;WITH p AS (
    SELECT ProjectID FROM tbl GROUP BY ProjectID HAVING count(*) > 1
    ),a AS (
    SELECT ActivityID FROM tbl GROUP BY ActivityID HAVING count(*) > 1
    )
SELECT t.*
      ,CASE
         WHEN p.ProjectID  IS NOT NULL
          AND a.ActivityID IS NOT NULL THEN 4 -- n:m (missing in question!)
         WHEN p.ProjectID  IS NOT NULL THEN 2 -- 1:n
         WHEN a.ActivityID IS NOT NULL THEN 3 -- n:1
         ELSE                               1 -- 1:1
       END AS CalculatedID
FROM   tbl AS t
LEFT   JOIN p ON p.ProjectID = t.ProjectID
LEFT   JOIN a ON a.ActivityID = t.ActivityID

Explain:

  • 1) In CTE p find all projects that have more than one activity.
  • 2) In CTE a find all activities that have more than one project.
  • 3) LEFT JOIN the findings to the base table and distinguish 4 cases in a CASE statement.

I added case 4 (n:m) that is missing in the question.
See the working demo on data.SE .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM