Using BigQuery and Standard SQL I am trying to calculate the retention rate for users seen in one period, compared to users seen in period after. I want to calculate this daily, using the same period offset which will slide as time passes by.
The data used is Google Analytics data, which include these fields: https://support.google.com/analytics/answer/3437719?hl=en
I have the query to calculate retention, and I know how to set this up to run daily, but I would like to create some history to start with, and then I need to simulate that this query has been running daily for some time.
So my question is how can I create retention rate numbers, for each day, given that the first period is eg 60-31 days ago and the following period is 30-1 day(s) ago, all relative to each day.
The query I have:
WITH
users_seen_on_start AS (
SELECT DISTINCT
fullVisitorId AS users
FROM `project.view.ga_sessions_*`, UNNEST(hits) as hits
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY))
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY))
),
num_users AS (
SELECT
count(*) AS num_users_in_cohort
FROM users_seen_on_start
),
engaged_user_by_day AS (
SELECT
COUNT (DISTINCT fullVisitorId) as num_engaged_users
FROM `project.view.ga_sessions_*`, UNNEST(hits) as hits INNER JOIN users_seen_on_start ON users = fullVisitorId
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
)
SELECT
num_engaged_users,
num_users_in_cohort ,
ROUND((num_engaged_users / num_users_in_cohort), 3) as retention_rate
FROM engaged_user_by_day CROSS JOIN num_users
Output from query:
num_engaged_users num_users_in_cohort retention_rate
100871 130632 0.772
Desired output:
date num_engaged_users num_users_in_cohort retention_rate
20190101 100871 130632 0.772
20190102 102356 128044 0.799
Sample data: users_seen_on_start:
CREATE TABLE users_seen_on_start(
users INTEGER NOT NULL PRIMARY KEY
);
INSERT INTO users_seen_on_start(users) VALUES (6854940999573646134);
INSERT INTO users_seen_on_start(users) VALUES (9215697890064860396);
INSERT INTO users_seen_on_start(users) VALUES (5595285367064974856);
INSERT INTO users_seen_on_start(users) VALUES (2054889847396937366);
INSERT INTO users_seen_on_start(users) VALUES (2159837518531156200);
INSERT INTO users_seen_on_start(users) VALUES (2297077047785095499);
INSERT INTO users_seen_on_start(users) VALUES (15934479773952228986);
INSERT INTO users_seen_on_start(users) VALUES (18388188973174323198);
INSERT INTO users_seen_on_start(users) VALUES (13527051077114159514);
INSERT INTO users_seen_on_start(users) VALUES (10527965347657532651);
INSERT INTO users_seen_on_start(users) VALUES (10056509199904199853);
INSERT INTO users_seen_on_start(users) VALUES (721447367663373337);
INSERT INTO users_seen_on_start(users) VALUES (7418392997259835212);
INSERT INTO users_seen_on_start(users) VALUES (1739158781654194388);
INSERT INTO users_seen_on_start(users) VALUES (13485010633919602577);
INSERT INTO users_seen_on_start(users) VALUES (11647513515368913077);
INSERT INTO users_seen_on_start(users) VALUES (14723171573825482124);
INSERT INTO users_seen_on_start(users) VALUES (316809625899342248);
INSERT INTO users_seen_on_start(users) VALUES (736697877724685769);
INSERT INTO users_seen_on_start(users) VALUES (1069762618672583190);
INSERT INTO users_seen_on_start(users) VALUES (6216571959193109764);
INSERT INTO users_seen_on_start(users) VALUES (8276320148745358024);
INSERT INTO users_seen_on_start(users) VALUES (4390033140354437765);
INSERT INTO users_seen_on_start(users) VALUES (4691956767605638049);
INSERT INTO users_seen_on_start(users) VALUES (8853050929187030210);
INSERT INTO users_seen_on_start(users) VALUES (4866380534293592106);
INSERT INTO users_seen_on_start(users) VALUES (9336123194114580988);
INSERT INTO users_seen_on_start(users) VALUES (9102157575556710064);
INSERT INTO users_seen_on_start(users) VALUES (5656668438554927436);
INSERT INTO users_seen_on_start(users) VALUES (1488391481235428518);
INSERT INTO users_seen_on_start(users) VALUES (2840931994944989396);
INSERT INTO users_seen_on_start(users) VALUES (2881922818148829205);
INSERT INTO users_seen_on_start(users) VALUES (15266979732129081227);
INSERT INTO users_seen_on_start(users) VALUES (17452034639473427980);
INSERT INTO users_seen_on_start(users) VALUES (16885946609916150102);
INSERT INTO users_seen_on_start(users) VALUES (11414196691107747488);
INSERT INTO users_seen_on_start(users) VALUES (11428367061145620067);
INSERT INTO users_seen_on_start(users) VALUES (11589939716097939663);
INSERT INTO users_seen_on_start(users) VALUES (9471966568512958356);
INSERT INTO users_seen_on_start(users) VALUES (10302973548806993195);
INSERT INTO users_seen_on_start(users) VALUES (11655856328192191298);
INSERT INTO users_seen_on_start(users) VALUES (13935768668138194306);
INSERT INTO users_seen_on_start(users) VALUES (12094331062677811830);
INSERT INTO users_seen_on_start(users) VALUES (10077917656361210181);
INSERT INTO users_seen_on_start(users) VALUES (12524832796889539656);
INSERT INTO users_seen_on_start(users) VALUES (12545063140779927439);
INSERT INTO users_seen_on_start(users) VALUES (12842029924433102779);
INSERT INTO users_seen_on_start(users) VALUES (642899976804427792);
INSERT INTO users_seen_on_start(users) VALUES (6445850403127955479);
INSERT INTO users_seen_on_start(users) VALUES (6564816699382875533);
INSERT INTO users_seen_on_start(users) VALUES (4991902095596735494);
INSERT INTO users_seen_on_start(users) VALUES (9240481697386039624);
INSERT INTO users_seen_on_start(users) VALUES (7462479064488338261);
INSERT INTO users_seen_on_start(users) VALUES (7954751513116206324);
INSERT INTO users_seen_on_start(users) VALUES (7916442053878140133);
INSERT INTO users_seen_on_start(users) VALUES (5943783673017806941);
INSERT INTO users_seen_on_start(users) VALUES (8019839094524452470);
INSERT INTO users_seen_on_start(users) VALUES (1325025305488677572);
INSERT INTO users_seen_on_start(users) VALUES (2757934917480873578);
INSERT INTO users_seen_on_start(users) VALUES (2953784252203011629);
INSERT INTO users_seen_on_start(users) VALUES (15663806630564163334);
INSERT INTO users_seen_on_start(users) VALUES (15822234287772625947);
INSERT INTO users_seen_on_start(users) VALUES (16086946171320332009);
INSERT INTO users_seen_on_start(users) VALUES (18326627563023885086);
INSERT INTO users_seen_on_start(users) VALUES (10177146105583960910);
INSERT INTO users_seen_on_start(users) VALUES (11536897925313534298);
INSERT INTO users_seen_on_start(users) VALUES (9521017502744452252);
INSERT INTO users_seen_on_start(users) VALUES (13846940074652198876);
INSERT INTO users_seen_on_start(users) VALUES (12072437921316824606);
INSERT INTO users_seen_on_start(users) VALUES (12130911952369094625);
INSERT INTO users_seen_on_start(users) VALUES (9944467373343765193);
INSERT INTO users_seen_on_start(users) VALUES (10130381296105130198);
INSERT INTO users_seen_on_start(users) VALUES (14503197259750222085);
INSERT INTO users_seen_on_start(users) VALUES (14514330935424697592);
INSERT INTO users_seen_on_start(users) VALUES (14700594671656063689);
INSERT INTO users_seen_on_start(users) VALUES (14735848995356133105);
INSERT INTO users_seen_on_start(users) VALUES (14880164794899465972);
INSERT INTO users_seen_on_start(users) VALUES (12844072088150040888);
INSERT INTO users_seen_on_start(users) VALUES (13244940156815171079);
INSERT INTO users_seen_on_start(users) VALUES (260647987138229776);
INSERT INTO users_seen_on_start(users) VALUES (4223941834652936903);
INSERT INTO users_seen_on_start(users) VALUES (7094513577121476923);
INSERT INTO users_seen_on_start(users) VALUES (9277783379267650134);
INSERT INTO users_seen_on_start(users) VALUES (5996874826262341487);
INSERT INTO users_seen_on_start(users) VALUES (6070125918500272373);
INSERT INTO users_seen_on_start(users) VALUES (1530161613058767114);
INSERT INTO users_seen_on_start(users) VALUES (3564084216977083409);
INSERT INTO users_seen_on_start(users) VALUES (2096791516261012274);
INSERT INTO users_seen_on_start(users) VALUES (17168509252900308876);
INSERT INTO users_seen_on_start(users) VALUES (17616873481220648376);
INSERT INTO users_seen_on_start(users) VALUES (17998058763193684336);
INSERT INTO users_seen_on_start(users) VALUES (16355698068697852664);
INSERT INTO users_seen_on_start(users) VALUES (18429432702234588790);
INSERT INTO users_seen_on_start(users) VALUES (11376613349708048591);
INSERT INTO users_seen_on_start(users) VALUES (11409024415220391296);
INSERT INTO users_seen_on_start(users) VALUES (11497048558563286896);
INSERT INTO users_seen_on_start(users) VALUES (11461240236069178124);
INSERT INTO users_seen_on_start(users) VALUES (10315048118076394592);
INSERT INTO users_seen_on_start(users) VALUES (10534194857330671443);
INSERT INTO users_seen_on_start(users) VALUES (13087206783728054302);
Sample data engaged_user_by_day:
CREATE TABLE engaged_user_by_day(
users INTEGER NOT NULL PRIMARY KEY
);
INSERT INTO engaged_user_by_day(users) VALUES (6854940999573646134);
INSERT INTO engaged_user_by_day(users) VALUES (9215697890064860396);
INSERT INTO engaged_user_by_day(users) VALUES (5595285367064974856);
INSERT INTO engaged_user_by_day(users) VALUES (2054889847396937366);
INSERT INTO engaged_user_by_day(users) VALUES (2159837518531156200);
INSERT INTO engaged_user_by_day(users) VALUES (2297077047785095499);
INSERT INTO engaged_user_by_day(users) VALUES (15934479773952228986);
INSERT INTO engaged_user_by_day(users) VALUES (18388188973174323198);
INSERT INTO engaged_user_by_day(users) VALUES (13527051077114159514);
INSERT INTO engaged_user_by_day(users) VALUES (10527965347657532651);
INSERT INTO engaged_user_by_day(users) VALUES (10056509199904199853);
INSERT INTO engaged_user_by_day(users) VALUES (721447367663373337);
INSERT INTO engaged_user_by_day(users) VALUES (7418392997259835212);
INSERT INTO engaged_user_by_day(users) VALUES (1739158781654194388);
INSERT INTO engaged_user_by_day(users) VALUES (13485010633919602577);
INSERT INTO engaged_user_by_day(users) VALUES (11647513515368913077);
INSERT INTO engaged_user_by_day(users) VALUES (14723171573825482124);
INSERT INTO engaged_user_by_day(users) VALUES (316809625899342248);
INSERT INTO engaged_user_by_day(users) VALUES (736697877724685769);
INSERT INTO engaged_user_by_day(users) VALUES (1069762618672583190);
INSERT INTO engaged_user_by_day(users) VALUES (6216571959193109764);
INSERT INTO engaged_user_by_day(users) VALUES (8276320148745358024);
INSERT INTO engaged_user_by_day(users) VALUES (4390033140354437765);
INSERT INTO engaged_user_by_day(users) VALUES (4691956767605638049);
INSERT INTO engaged_user_by_day(users) VALUES (8853050929187030210);
INSERT INTO engaged_user_by_day(users) VALUES (4866380534293592106);
INSERT INTO engaged_user_by_day(users) VALUES (9336123194114580988);
INSERT INTO engaged_user_by_day(users) VALUES (9102157575556710064);
INSERT INTO engaged_user_by_day(users) VALUES (5656668438554927436);
INSERT INTO engaged_user_by_day(users) VALUES (1488391481235428518);
INSERT INTO engaged_user_by_day(users) VALUES (2840931994944989396);
INSERT INTO engaged_user_by_day(users) VALUES (2881922818148829205);
INSERT INTO engaged_user_by_day(users) VALUES (15266979732129081227);
INSERT INTO engaged_user_by_day(users) VALUES (17452034639473427980);
INSERT INTO engaged_user_by_day(users) VALUES (16885946609916150102);
INSERT INTO engaged_user_by_day(users) VALUES (11414196691107747488);
INSERT INTO engaged_user_by_day(users) VALUES (11428367061145620067);
INSERT INTO engaged_user_by_day(users) VALUES (11589939716097939663);
INSERT INTO engaged_user_by_day(users) VALUES (9471966568512958356);
INSERT INTO engaged_user_by_day(users) VALUES (10302973548806993195);
INSERT INTO engaged_user_by_day(users) VALUES (11655856328192191298);
INSERT INTO engaged_user_by_day(users) VALUES (13935768668138194306);
INSERT INTO engaged_user_by_day(users) VALUES (12094331062677811830);
INSERT INTO engaged_user_by_day(users) VALUES (10077917656361210181);
INSERT INTO engaged_user_by_day(users) VALUES (12524832796889539656);
INSERT INTO engaged_user_by_day(users) VALUES (12545063140779927439);
INSERT INTO engaged_user_by_day(users) VALUES (12842029924433102779);
INSERT INTO engaged_user_by_day(users) VALUES (642899976804427792);
INSERT INTO engaged_user_by_day(users) VALUES (6445850403127955479);
INSERT INTO engaged_user_by_day(users) VALUES (6564816699382875533);
INSERT INTO engaged_user_by_day(users) VALUES (4991902095596735494);
INSERT INTO engaged_user_by_day(users) VALUES (9240481697386039624);
INSERT INTO engaged_user_by_day(users) VALUES (7462479064488338261);
Rolling data in SQL can be implemented using a self-join. Please see an example of this below that finds data from the past 30 days relative to each day:
from atable a1
left join atable a2 on a1.custid=a2.custid
and datediff(day,a1.dt,a2.dt)<=30 and datediff(day,a1.dt,a2.dt)>0
There are some links that address a time window:
The BigQuery LAG function .
Query: Daily retention
Retention , Using BigQuery and a Simple Data Model
However, you use case sounds more like Custom Retention Cohorts Using BigQuery and Google Analytics . Even when the link describe a use case for Firebase, the queries explained in there address your main inquiry. For example, the following query address the retention across every three days:
WITH analytics_data AS (
SELECT user_pseudo_id, event_timestamp, event_name, device.mobile_model_name,
UNIX_MICROS(TIMESTAMP("2018-08-01 00:00:00", "-7:00")) AS start_day,
3600*1000*1000*24*3 AS one_week_micros -- Note the *3 at the end here
FROM `firebase-public-project.analytics_153293282.events_*`
WHERE _table_suffix BETWEEN '20180731' AND '20180829'
)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.