简体   繁体   中英

Querying for status updates per day from an audit log of version control?

I am currently working on an audit log that keeps track of the version history of the various items ie tracks the actual changes along with a marker stating the type of change (created, updated or deleted).

Now with each item there is also a 'status' column showing the status of that item (open, agree, maybe).

Required query : Get the count of the status of items per day till now. So the output should look something like this:

day | status | count
---------------------
 1  |  open  |  3
 2  |  open  |  4
 2  |  maybe |  1
 2  |  agree |  2
 3  |  open  |  2
 3  |  agree |  2

and so on. I've been struggling to frame this query from the audit log table ( wc_audit_log ) that looks like the image below. There are other columns but are mostly text and irrelevant for this query (IMHO :)

在此处输入图片说明

I've tried playing around with various combinations of group by and order by as well as the year, dayofmonth, month functions, but can't seem to wrap my head around how to frame this query. The trickiest part being the 'day' boundaries and duplicates with respect to version control. That is, it's entirely possible to have an item be updated multiple times without any status updates in the same day or transition through multiple statuses within the same day.

So in case of status based duplicates, the latest timestamped item would be selected. Ie if an item was updated twice and the status was 'open' both the times, just pick the last one. Double counting is fine ie if the item was open and agreed on the same day it's okay for it to be counted in both places.

However, I'm still unable to figure out how to frame such a query. The image above should shows a part of the table for only those columns that are relevant but should also give an idea of the duplicates etc. involved making this a non-trivial query in my opinion.

PS: The items marked as deleted wouldn't be considered so aren't part of the table above. However, the above holds true even if the item was deleted but existed 'in the past'

I think this does what you want. It counts the number of wc_ids that have any given status on each day. It does not count duplicates within a day.

select extract(year from timestamp), extract(month from timestamp),
       extract(day from timestamp),
       status, count(distinct wc_id)
from a
group by extract(year from timestamp), extract(month from timestamp),
         extract(day from timestamp), status
order by 1, 2, 3, 4

However, if there are duplicates across days, then the id gets counted twice with the same status on the two days.

I reread your description a couple of times. Isn't it just:

select datediff(now(), timestamp), status, count(distinct wc_id)
from foo
group by 1,2

You might try this:

SELECT `day`, status, COUNT(wc_id) as `count`
FROM
    (SELECT DATE(timestamp) as `day`, wc_id, status, MAX(timestamp) as `max_time`
    FROM table_name
    GROUP BY `day`, wc_id, status) AS max_timestamp_per_wcid_and_status
GROUP BY `day`, status
ORDER BY `day` ASC, status DESC

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM