简体   繁体   中英

Faster SQL query with CASE in JOIN instead of CASE in SELECT statement of query?

I have a view of CommunityMembers where each has a primary key for ID. Some also have old ID's from another system and some have a spouse ID. All ID's are unique.

eg:

ID | Name         | OldID   | SpouseID  | SpouseName
1  | John.Smith   | o71     | s99       | Jenna.Smith
2  | Jane.Doe     | o72     |           | 
3  | Jessie.Jones |         |       

I also have a view of ActivityDates where each Community member can have multiple activity dates. There are activity dates for old ID's and for Spouse ID's. (Unfortunately I can't clean the data up by converting old to new ID's)

eg:

ID  | ActivityDate | ActiviyType | ActivityGroup
1   | 2017-12-31   | 1           | 1
1   | 2017-12-31   | 3           | 2
1   | 2017-12-31   | 7           | 1
2   | 2017-12-31   | 1           | 1
3   | 2017-12-31   | 1           | 1
o72 | 2010-12-31   | 1           | 2
o72 | 2010-12-31   | 3           | 1
s99 | 2017-12-31   | 1           | 1
s99 | 2017-12-31   | 2           | 1

I can select the data in the way I need it using the following method having multiple case selects running 3 times to check the 3 possible ID's though it is very slow because it is running a select query multiple times per record:

SELECT 
    C.ID, 
    C.Name,
    C.OldID,
    C.SpouseID,
    C.SpouseName,
    CASE 
       WHEN C.ID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
            AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
            OR C.OldID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
            AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
            OR C.SpouseID (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType = 1 AND ActiviyGroup = 1)
            AND NOT EXISTS (SELECT ID FROM ActivityDates WHERE ActivityDate > 2016-12-31 AND ActiviyType > 1 AND ActiviyGroup > 1)
          THEN 'Yes' 
          ELSE '' 
       END AS Result i.e. HasTheCommunityMemberOrTheirSpouseOnlyEverAttendedActivityTypeAndGroup1After2016?

So I would expect the following results, which I get, it is just slow:

ID | Name         | OldID   | SpouseID  | SpouseName   | Result
1  | John.Smith   | o71     | s99       | Jenna.Smith  | 
2  | Jane.Doe     | o72     |           |              | Yes
3  | Jessie.Jones |         |           |              | Yes

I appreciate that there are better ways to do this which I'm happy to hear suggestions on though I have limited flexibility in changing this system so that aside all I am asking is how can I make this faster? Ideally I want to use a join to the table and use conditions off that though I can't work it out. eg

SELECT 
    C.ID, C.Name,
    C.OldID, C.SpouseID, C.SpouseName,
    R.Result
FROM 
    CommunityMembers C
JOIN 
    CASE WHEN Date ... Type ... Group ... ELSE ... IN ... Not Exist ... THEN ... ActivityDates R

or

SELECT 
    C.ID, C.Name,
    C.OldID, C.SpouseID, C.SpouseName,
    CASE 
       WHEN R.Date ... R.Type ... R.Group ... ELSE ... THEN 'Yes' END AS Result
FROM 
    CommunityMembers C
JOIN 
    ActivityDates R

I suspect I need to make multiple joins though I don't know how to write it.

Thank you

Index is just like this:

CREATE INDEX index_name
ON table_name (column1, column2, ...);

see this link for more details

Here is another pattern for utilising 'optional joins' that may or may not perform better. It's not quite the same as your output - I'm not sure what you're after there.

SELECT A.*,
COALESCE(C1.Name, C2.Name, C3.Name) As Name
FROM  ActivityDates  A
LEFT OUTER JOIN CommunityMember As C1
ON C1.ID = A.ID
LEFT OUTER JOIN CommunityMember As C2
ON C2.OldID = CAST(A.ID AS VARCHAR(12))
LEFT OUTER JOIN CommunityMember As C3
ON C2.SpouseID = CAST(A.ID AS VARCHAR(12))

There are cases where this will 'double count' but if you are certain that the entire collection of id's is unique you should be fine. If you only want to know if an activity record exists you can definitely speed this up by using exists but again I don't follow your logic.

You want information from table ActivityDates per ID. So group by ID and filter the desired IDs in HAVING :

SELECT ID 
FROM ActivityDates
WHERE ActivityDate > '2016-12-31'
GROUP BY ID
HAVING COUNT(CASE WHEN ActiviyType = 1 AND ActiviyGroup = 1 THEN 1 END) > 1
   AND COUNT(CASE WHEN ActiviyType > 1 AND ActiviyGroup > 1 THEN 1 END) = 0

You can use this with an EXISTS clause:

select
  c.*, 
  case when exists 
  (
    SELECT a.ID 
    FROM ActivityDates a
    WHERE a.ActivityDate > '2016-12-31'
      AND a.ID in (c.id, c.oldid, c.spouseid)
    GROUP BY a.ID
    HAVING COUNT(CASE WHEN ActiviyType = 1 AND ActiviyGroup = 1 THEN 1 END) > 1
       AND COUNT(CASE WHEN ActiviyType > 1 AND ActiviyGroup > 1 THEN 1 END) = 0
) then 'Yes' else '' end as result
from c;

Appropriate indexes to speed this up may be

create index idx1 on ActivityDates (ID, ActivityDate, ActivityType, ActivityGroup);

create index idx2 on ActivityDates (ActivityDate, ID, ActivityType, ActivityGroup);

Find out whether one of them gets used and drop the other (or both in case None gets used).

It is possible that using the subquery non-correlated (which means we must access it multiple times) performs better. It depends on the optimizer if it even comes to a different execution plan:

with good_ids as
(
  select id 
  from activitydates
  where activitydate > '2016-12-31'
  group by id
  having count(case when activiytype = 1 and activiygroup = 1 then 1 end) > 1
     and count(case when activiytype > 1 and activiygroup > 1 then 1 end) = 0
)
select
  c.*,
  case when id       in (select id from good_ids)
         or oldid    in (select id from good_ids)
         or spouseid in (select id from good_ids)
       then 'Yes' else ''
  end as result
from c;

You should try to explain the output .It is difficult to find the correct biz. rule from wrong query.

This way you get best query from here.Just try explaning again that why id 2,3 is yes.Then i will rewrite my query.

Second biggest mistake you are about to commit is that without understanding your biz. rule ,without writing correct query,you are going to create index

Try this,

declare @t table(ID varchar(20),Name varchar(40),OldID varchar(20), SpouseID  varchar(20)
, SpouseName varchar(40))
insert into @t VALUES
('1','John.Smith','o71' ,'s99','Jenna.Smith')
,('2','Jane.Doe' ,'o72',null,null)
,('3','Jessie.Jones',null,null,null)       

--select * from @t
declare @ActivityDates table(ID varchar(20), ActivityDate date
, ActiviyType int, ActivityGroup int)
insert into @ActivityDates VALUES
('1','2017-12-31',1, 1)
,('1','2017-12-31',3, 2)
,('1','2017-12-31',7, 1)
,('2','2017-12-31',1, 1)
,('3','2017-12-31',1, 1)
,('o72','2010-12-31',1, 2)
,('o72','2010-12-31',3, 1)
,('s99','2017-12-31',1, 1)
,('s99','2017-12-31',2, 1)

SELECT t.*
,case when tbl.id is not null then 'Yes' else null end Remarks
 from @t t
left JOIN
(select * from @ActivityDates AD
 WHERE(( ActivityDate > '2016-12-31' AND ActiviyType = 1 AND ActivityGroup = 1
 AND NOT EXISTS (SELECT ID FROM @ActivityDates ad1 WHERE (ad.id=ad1.id) AND
  ActivityDate > '2016-12-31' AND (ActiviyType > 1 or ActivityGroup > 1))
 )
  ))tbl
  on t.ID=tbl.ID

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM