I need to calculate the day-1 retention by user registration date. Day-1 retention is defined as the number of users who return 1 day after the registration date divided by the number of users who registered on the registration date.
Here's the user table
CREATE TABLE registration (
user_id SERIAL PRIMARY KEY,
user_name VARCHAR(255) NOT NULL,
registrationDate TIMESTAMP NOT NULL
);
INSERT INTO registration (user_id, user_name, registrationDate)
VALUES
(0, 'John', '2018-01-01 00:01:00'),
(1, 'David', '2018-01-01 00:04:30'),
(2, 'Cassy', '2018-01-02 10:00:00'),
(3, 'Winka', '2018-01-02 14:30:00')
;
CREATE TABLE log (
user_id INTEGER,
eventDate TIMESTAMP
);
INSERT INTO log (user_id, eventDate)
VALUES
(0, '2018-01-01 01:00:00'),
(0, '2018-01-02 04:00:00'),
(0, '2018-01-04 06:00:00'),
(1, '2018-01-01 00:30:00'),
(3, '2018-01-02 14:40:00'),
(3, '2018-01-04 12:20:00'),
(3, '2018-01-06 13:30:00'),
(2, '2018-01-12 10:10:00'),
(2, '2018-01-13 09:00:00')
I tried to join the registration table to log table, so I can compare the date difference.
select registration.user_id, registrationDate, log.eventDate,
(log.eventDate - registration.registrationDate) as datediff
from log left join registration ON log.user_id = registration.user_id
I think I somehow need to perform below tasks.
where datediff = 1
I am new to SQL and learning it as I am solving the problem. Any help/advice will be appreciated
The expected outcome should return a table with two columns (registrationDate and retention) with rows for each date any user registered.
I am not quiet sure if this is your expected result: For registrationdate = 2018-01-01
all two users have been logged within the first day, so the result is 1
. For registrationdate = 2018-01-02
only one of two users have been logged within this range, so the result is 0.5
SELECT registrationdate, COUNT(*) FILTER (WHERE is_in_one_day) / daily_regs::decimal -- 6 FROM ( SELECT DISTINCT ON (l.user_id) -- 4 l.user_id, eventdate::date AS eventdate, registrationdate::date AS registrationdate, daily_regs, eventdate - registrationdate < interval '1 day' AS is_in_one_day -- 3 FROM log l JOIN ( -- 2 SELECT *, COUNT(user_id) OVER (PARTITION BY registrationdate::date) AS daily_regs --1 FROM registration ) r ON l.user_id = r.user_id ORDER BY l.user_id, eventdate ) s GROUP BY registrationdate, daily_regs -- 5
registrations
) on their user_id
eventdate
and the registrationdate
. Check if this is less one day. FILTER
clause) and divide by the total number of registrations calculated in (1) Day-1 retention is defined as the number of users who return 1 day after the registration date divided by the number of users who registered on the registration date.
This interprets the definition as being based on calendar days. I would express this as:
What ratio of users come back on the day after they register?
I think this is the simplest method:
select count(distinct l.user_id) * 1.0 / count(distinct r.user_id)
from registration r left join
log l
on l.user_id = r.user_id and
l.eventDate::date = r.registrationDate::date + interval '1 day';
The count(distinct)
is only needed if multiple events can happen on a single day.
Here is a db<>fiddle.
I'm not sure the definition is 100% useful. If you have another definition in mind, I would suggest that you ask a new question, with appropriate sample data and desired results .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.