简体   繁体   中英

SQL Server query - count occurrences of values in a column with specific criteria on multiple tables

I'm trying to generate report from three different tables on SQL Server, which shows count or number of occurrences of Account_id from accounts table in the Account_entries and Users tables with different criteria from three tables.

Table #1: ACCOUNTS

ID          ACCOUNT_TYPE         
-------------------------
354857      Customer            
354858      Agent          
354859      Fee
354860      Customer 
354861      Customer 
354862      Agent   
354863      Cashier

Table #2: ACCOUNT_ENTRIES

ID     ACCOUNT_ID   narrative_TYPE    CREATED_AT  
-------------------------------------------------
35     Customer     Fee               2018-01-02  
36     Agent        Fee               2018-11-02
37     Fee          BalanceUpdate     2018-11-03
39     Customer     BalanceUpdate     2018-11-03  

Table #3: USERS

ID    PHONE_NUMBER  REGISTERED_BY (ACCOUNT_ID)   CREATED_AT  
------------------------------------------------------------
35    XXXXXXX       354858                       2018-01-02    
36    XXXXXXX       354877                       2018-11-02
37    XXXXXXX       354858                       2018-11-03
39    XXXXXXX       354858                       2018-11-03       

I have tried this SQL query, but I can't get the output I want:

select 
    ac.id, count(ae.id) as counter1, count(u.registered_by) as counter2 
from 
    db2inst1.accounts ac
left outer join 
    db2inst1.account_entries ae on ac.id = ae.account_id
left outer join 
    db2inst1.users u on ac.id = u.registered_by 
where 
    ae.narrative_type = 'BalanceUpdate' 
    and ae.created_at > '2018-11-30' 
    and ae.created_at < '2019-01-01' 
    and u.created_at > '2018-11-30' 
    and u.created_at < '2019-01-01' 
    and ac.account_type = 'Agent'
group by 
    ac.id

What actually I want to see is below

ACCOUNT_ID    COUNTER1  COUNTER2   COUNTER1+COUNTER2
----------------------------------------------------
354857            20         2      22 
354858            24        23      47
354859            26        11      37
354860            27        23      60  

where counter one counts number of occurrences of account_id in account_entries and counter two is on users table (registered by)

Help please

I think the quick-and-dirty way to get way you want is to use count(distinct) . You also need to move filtering conditions into the on clause, so rows are not unnecessarily filtered out:

select ac.id, count(distinct ae.id) as counter1, 
       count(distinct u.registered_by) as counter2 
from db2inst1.accounts ac left outer join
     db2inst1.account_entries ae
     on ac.id = ae.account_id and
        ae.narrative_type = 'BalanceUpdate' and
        ae.created_at > '2018-11-30' and
        ae.created_at < '2019-01-01' left outer join
     db2inst1.users u
     on ac.id = u.registered_by and
        u.created_at > '2018-11-30' and
        u.created_at < '2019-01-01'
where ac.account_type = 'Agent'
group by ac.id;

There are a couple potential issues I see with the SELECT query (very solid attempt, though, so nice start!)

  1. Doing a LEFT JOIN and then in the WHERE clause filtering on a column from the table in the LEFT JOIN pretty much turns it into an INNER JOIN .

Consider these results from a left join, assuming account_id "2" doesn't have a record in the account_entries table:

SELECT * FROM accounts A LEFT JOIN account_entries B ON A.id = B.account_id

|-- accounts table --|  |----------- account_entries table ---------|
id   account_type        id    account_id  narrative_type  created_at
---------------------------------------------------------------------
1    Agent               101   1           Fee             2018-12-01
1    Agent               102   1           BalanceUpdate   2018-12-02
2    Customer            NULL  NULL        NULL            NULL
3    Agent               103   3           Fee             2018-12-01

In this case, if you add to the query WHERE narrative_type = 'BalanceUpdate' , then that will get evaluated for every record and since NULL does not equal 'BalanceUpdate', it'll filter out account_id "2". This mimics the behavior of an INNER JOIN

To get around this you could move the filter to the ON clause for the join rather than in the WHERE clause (for example, ON A.id = B.account_id AND B.narrative_type = 'BalanceUpdate' )

In some cases keeping it in the WHERE clause, but using ISNULL can help, but I don't think that makes any sense in this particular use case.


  1. Since there could be multiple records for each account in account_entries and users, if you join them both back to the accounts table you'll end up with somewhat of a cartesian product.

For example, if you have these account_entries:

id    account_id  narrative_type  created_at
--------------------------------------------
101   1           Fee             2018-12-01
102   1           BalanceUpdate   2018-12-02
103   3           Fee             2018-12-01

And these users:

id    phone_number  registered_by  created_at
---------------------------------------------
1001  XXXXX         1              2018-12-01
1002  XXXXX         1              2018-12-01
1003  XXXXX         2              2018-12-01

Joining them together without any relationship between them other than account id would have to match every account entry with every user that matches the account id. And you'll end up with this:

account_id  account_entry_id  user_id
--------------------------------------------
1           101               1001
1           101               1002
1           102               1001
1           102               1002
2           NULL              1003
3           103               NULL

To get around that, you could potentially use COUNT(DISTINCT ...) , which would then ignore those duplicates. This is probably fine, but perhaps on larger sets of data it could become problematic for performance.

I'd prefer to do the aggregation before joining the data. This could be done as simple sub queries, or could also be done very cleanly using common table expressions ("CTEs")

Here is how I'd approach the query:

WITH cte_account_entries AS
    (
        SELECT
            account_id,
            COUNT(*) account_entries
        FROM account_entries 
        WHERE   narrative_type = 'BalanceUpdate'
            AND CAST(created_at AS DATE) BETWEEN '2018-12-01' AND '2018-12-31'
        GROUP BY 
            account_id   
    ),
cte_users AS 
    (
        SELECT
            registered_by,
            COUNT(*) users
        FROM users 
        WHERE   CAST(created_at AS DATE) BETWEEN '2018-12-01' AND '2018-12-31'
        GROUP BY 
            registered_by   
    )
SELECT
    A.id account_id,
    A.account_type,
    ISNULL(B.account_entries, 0) counter1,
    ISNULL(C.users, 0) counter2,
    ISNULL(B.account_entries, 0) + ISNULL(C.users, 0) [counter1+counter2]
FROM accounts A 
LEFT JOIN cte_account_entries B
ON      A.id = B.account_id
LEFT JOIN cte_users C 
ON      A.id = C.registered_by
WHERE   A.account_type = 'Agent'

cte_account_entries is the first common table expression, which calculates the number of account entries by account, implementing the filters noted in the question. Note I did CAST(... AS DATE) in case the column contains both date AND time.

cte_users is similar, but with the users table.

Finally, it is all brought together in the final SELECT statement, filtering down to just the "Agent" account type and LEFT JOIN s are joining to the CTEs which are yielding only one record per account, so there will be no cartesian product.

ISNULL is also very helpful here. If, for example, there are no account entries for an account, but there are 12 users, then you might end up trying to add them together like NULL + 12, which would yield NULL. ISNULL will convert that NULL to 0, so you get 0 + 12.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM