How to compare and show if any changes in historical data in AWS Athena?

Question

I have a table where historical data is stored. The sample structure of the table is like,

id	config	userid	team	createdon
1	"config-01"	"user-01"	"Dev"	"2022:08:28"
2	"config-02"	"user-01"	"Dev"	"2022:08:27"
3	"config-01"	"user-01"	"Dev"	"2022:08:26"
4	"config-01"	"user-01"	"QA"	"2022:08:25"
5	"config-01"	"user-01"	"QA"	"2022:08:24"
6	"config-20"	"user-01"	"QA"	"2022:08:23"
7	"config-01"	"user-01"	"QA"	"2022:08:22"
8	"config-01"	"user-01"	"Prod"	"2022:08:21"
9	"config-01"	"user-01"	"Prod"	"2022:08:20"
10	"config-11"	"user-02"	"Prod"	"2022:08:19"
11	"config-15"	"user-02"	"Prod"	"2022:08:18"

Now, I need to query data based on user-id, so first 9 rows will be taken for user-id "user-01". From these 9 rows I need to take latest two rows(based on createdon) of each team. So the table as of now looks like this,

id	config	userid	team	createdon
1	"config-01"	"user-01"	"Dev"	"2022:08:28"
2	"config-02"	"user-01"	"Dev"	"2022:08:27"
4	"config-01"	"user-01"	"QA"	"2022:08:28"
5	"config-01"	"user-01"	"QA"	"2022:08:27"
8	"config-01"	"user-01"	"Prod"	"2022:08:28"
9	"config-01"	"user-01"	"Prod"	"2022:08:27"

Now I need to compare two rows of each team against their config column and say if there is a change between them. So the final response I am looking for will be,

userid	team	configchanged
"user-01"	"Dev"	TRUE
"user-01"	"QA"	FALSE
"user-01"	"Prod"	FALSE

As of now I was able to reach at a point where I can see rows with same team together using this self join query,

select * from table t1, table t2
where t1.id <> t2.id and t1.team = t2.team and t1.userid = "user-01"

I need help on rest of the part.

Any help on generating a query to achieve this solution would be really helpful. Thank You

Answer 1

Not sure why this is tagged both with MySQL and Presto, but here is approach for Presto/Trino.

You can use window functions ( lag in particular to compare "current" and "previous" records) and group by. Here is a little bit more generic approach (without filtering, but that is easy to add):

-- sample data
with dataset (id, config, userid, team, createdon) as (
    values (1   , 'config-01', 'user-01', 'Dev',    '2022:08:28'),
        (2  , 'config-02', 'user-01', 'Dev',    '2022:08:27'),
        (3  , 'config-01', 'user-01', 'Dev',    '2022:08:26'),
        (4  , 'config-01', 'user-01', 'QA', '2022:08:25'),
        (5  , 'config-01', 'user-01', 'QA', '2022:08:24'),
        (6  , 'config-20', 'user-01', 'QA', '2022:08:23'),
        (7  , 'config-01', 'user-01', 'QA', '2022:08:22'),
        (8  , 'config-01', 'user-01', 'Prod','2022:08:21'),
        (9  , 'config-01', 'user-01', 'Prod','2022:08:20'),
        (10 , 'config-11', 'user-02', 'Prod','2022:08:19'),
        (11 , 'config-15', 'user-02', 'Prod','2022:08:18')
)

-- query
select userid, 
   team, 
   max_by(configchanged, createdon) configchanged -- get last change status
from (
   select *,
    config != coalesce(lag(config) over (partition by userid, team order by createdon), config) configchanged -- coalesce is used just in case there is only one row 
   from dataset
)
group by userid, team

Output:

userid	team	configchanged
user-01	QA	false
user-01	Prod	false
user-01	Dev	true
user-02	Prod	true

How to compare and show if any changes in historical data in AWS Athena?

Question

1 answers

solution1
0 2022-08-28 08:54:49

How to compare and show if any changes in historical data in AWS Athena?

Question

1 answers

solution1 0 2022-08-28 08:54:49

solution1
0 2022-08-28 08:54:49