简体   繁体   中英

SQL to return 1 or 0 depending on values in a column's audit trail

If I were to have a table such as the one below:

id_ last_updated_by
1 robot
1 human
1 robot
2 robot
3 robot
3 human

Using SQL, how could I group by the ID and create a new column to indicate whether a human has ever updated the record like this:

id_ last_updated_by updated_by_human
1 robot 1
2 robot 0
3 robot 1

UPDATE

I'm currently doing the following, though I'm not sure how efficient this is. Selecting the latest record and then merging it with my calculated column via a sub-select.

SELECT  MAIN.TRANSACTION_ID,
            MAIN.CREATED_DATE
            MAIN.CREATED_BY_USER_ID,
            MAIN.OWNER_USER_ID,
            STP.TOUCHED_BY_HUMAN
    FROM (
            SELECT  TRANSACTION_ID,
                    CREATED_DATE
                    CREATED_BY_USER_ID_
                    OWNER_USER_ID_
            FROM    TABLE_NAME
            WHERE   CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
            AND     CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
            QUALIFY row_number() OVER (partition by TRANSACTION_ID order by End_Dt desc) = 1
) MAIN

LEFT JOIN (
        SELECT  TRANSACTION_ID,
                CASE
                WHEN   CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE')  OR
                       CREATED_BY_USER_ID LIKE 'N%' OR
                       CREATED_BY_USER_ID IS NULL
                THEN 0
                ELSE 1 END AS CREATED_BY_HUMAN,
                CASE
                WHEN   OWNER_USER_ID IN ('ROBOT', 'MACHINE')  OR
                       OWNER_USER_ID LIKE 'N%' OR
                       OWNER_USER_ID IS NULL
                THEN 0
                ELSE 1 END AS OWNED_BY_HUMAN,
                CASE
                WHEN   CREATED_BY_HUMAN = 0 AND
                       OWNED_BY_HUMAN = 0
                THEN 0
                ELSE 1 END AS TOUCHED_BY_HUMAN_
        FROM    TABLE_NAME
        WHERE   CREATED_DATE >= CAST('{start_date} 00:00:00' AS TIMESTAMP)
        AND     CREATED_DATE <= CAST('{end_date} 23:59:59' AS TIMESTAMP)
        QUALIFY row_number() OVER (partition by TRANSACTION_ID order by TOUCHED_BY_HUMAN_ desc) = 1
) STP
ON MAIN.TRANSACTION_ID = STP.TRANSACTION_ID

Since you didn't specified which column is used to determine this record is the newest record added by a given id , I assume that there will be a column to track the insert/modify timestamp (which is pretty standard table design), let's put it is last_updated_timestamp (if you don't have any, then I still insist you to have one as an auditing trail without timestamp does not make sense)

Given your table name is updating_trail

SELECT updating_trail.*, last_update_trail.modified_by_human
FROM updating_trail
INNER JOIN (
  -- determine the id_, the lastest modified_timestamp, and a flag check to determine if there is any record with last_update_by is 'human' -> if yes then give 1
  SELECT updating_trail.id_, MAX(last_update_timestamp) AS most_recent_update_ts, MAX(CASE WHEN updating_trail.last_updated_by = 'human' THEN 1 ELSE 0 END) AS modified_by_human
  FROM updating_trail
  GROUP BY updating_trail.id_
) last_update_trail
ON updating_trail.id_ = last_update_trail.id_ AND updating_trail.last_update_timestamp = last_update_trail.most_recent_update_ts;

Give

id_ last_updated_by last_update_timestamp modified_by_human
1 robot 2021-10-19T20:00:00.000Z 1
2 robot 2021-10-19T17:00:00.000Z 0
3 robot 2021-10-19T16:00:00.000Z 1

Check out this sample db fiddle I created for you

Exists clauses are usually not that performant but if your data isn't big this should work.

select id_,
IF (EXISTS (SELECT 1 FROM table_name t2 WHERE t2.last_updated_by = 'human' and t2.id_ = t1.id_), 1, 0) AS updated_by_human 
from table_name t1;

here is another way

SELECT * 
FROM table_name t1 
GROUP BY ti.id_
HAVING COUNT(*) > 0
AND MAX(CASE t1.last_updated_by  WHEN 'human' THEN 1 ELSE 0 END) = 1;

If I'm following your problem, then something like this should work.

SELECT
    t.*
    ,CASE WHEN a.id IS NOT NULL THEN 1 ELSE 0 END AS updated_by_human
FROM table t
    LEFT JOIN (SELECT DISTINCT id FROM table WHERE last_updated_by = 'human') a ON t.id = a.id

That takes care of the updated_by_human field, but if you also need to reduce the records in table (only keeping a subset) then you need more information to do that.

This is a 1:1 translation of your query to conditional aggregation :

SELECT  TRANSACTION_ID,
        CREATED_DATE,
        CREATED_BY_USER_ID,
        OWNER_USER_ID,
        Max(CASE
              WHEN  CREATED_BY_USER_ID IN ('ROBOT', 'MACHINE')  OR
                    CREATED_BY_USER_ID LIKE 'N%' OR
                    CREATED_BY_USER_ID IS NULL
              THEN 0
              ELSE 1
            END) Over (PARTITION BY TRANSACTION_ID) AS CREATED_BY_HUMAN
FROM    Table_Name
WHERE   CREATED_DATE >= Cast('{start_date} 00:00:00' AS TIMESTAMP)
AND     CREATED_DATE <= Cast('{end_date} 23:59:59' AS TIMESTAMP)
QUALIFY Row_Number() Over (PARTITION BY TRANSACTION_ID ORDER BY End_Dt DESC) = 1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM