简体   繁体   中英

BigQuery SQL: Execute query over rolling time window

Using BigQuery and Standard SQL I am trying to calculate the retention rate for users seen in one period, compared to users seen in period after. I want to calculate this daily, using the same period offset which will slide as time passes by.

The data used is Google Analytics data, which include these fields: https://support.google.com/analytics/answer/3437719?hl=en

I have the query to calculate retention, and I know how to set this up to run daily, but I would like to create some history to start with, and then I need to simulate that this query has been running daily for some time.

So my question is how can I create retention rate numbers, for each day, given that the first period is eg 60-31 days ago and the following period is 30-1 day(s) ago, all relative to each day.

The query I have:

WITH

users_seen_on_start AS (
  SELECT DISTINCT
    fullVisitorId AS users
  FROM `project.view.ga_sessions_*`, UNNEST(hits) as hits
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 60 DAY))
                          AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 31 DAY))
),

num_users AS (
  SELECT 
    count(*) AS num_users_in_cohort 
  FROM users_seen_on_start
),

engaged_user_by_day AS (  
  SELECT 
    COUNT (DISTINCT fullVisitorId) as num_engaged_users
  FROM `project.view.ga_sessions_*`, UNNEST(hits) as hits INNER JOIN users_seen_on_start ON users = fullVisitorId 
  WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
                          AND FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
)

SELECT 
  num_engaged_users, 
  num_users_in_cohort ,
  ROUND((num_engaged_users / num_users_in_cohort), 3) as retention_rate
FROM engaged_user_by_day CROSS JOIN num_users 

Output from query:

num_engaged_users   num_users_in_cohort             retention_rate  
100871              130632                          0.772

Desired output:

date                    num_engaged_users   num_users_in_cohort retention_rate  
20190101                100871              130632              0.772
20190102                102356              128044              0.799

Sample data: users_seen_on_start:

CREATE TABLE users_seen_on_start(
   users INTEGER  NOT NULL PRIMARY KEY 
);
INSERT INTO users_seen_on_start(users) VALUES (6854940999573646134);
INSERT INTO users_seen_on_start(users) VALUES (9215697890064860396);
INSERT INTO users_seen_on_start(users) VALUES (5595285367064974856);
INSERT INTO users_seen_on_start(users) VALUES (2054889847396937366);
INSERT INTO users_seen_on_start(users) VALUES (2159837518531156200);
INSERT INTO users_seen_on_start(users) VALUES (2297077047785095499);
INSERT INTO users_seen_on_start(users) VALUES (15934479773952228986);
INSERT INTO users_seen_on_start(users) VALUES (18388188973174323198);
INSERT INTO users_seen_on_start(users) VALUES (13527051077114159514);
INSERT INTO users_seen_on_start(users) VALUES (10527965347657532651);
INSERT INTO users_seen_on_start(users) VALUES (10056509199904199853);
INSERT INTO users_seen_on_start(users) VALUES (721447367663373337);
INSERT INTO users_seen_on_start(users) VALUES (7418392997259835212);
INSERT INTO users_seen_on_start(users) VALUES (1739158781654194388);
INSERT INTO users_seen_on_start(users) VALUES (13485010633919602577);
INSERT INTO users_seen_on_start(users) VALUES (11647513515368913077);
INSERT INTO users_seen_on_start(users) VALUES (14723171573825482124);
INSERT INTO users_seen_on_start(users) VALUES (316809625899342248);
INSERT INTO users_seen_on_start(users) VALUES (736697877724685769);
INSERT INTO users_seen_on_start(users) VALUES (1069762618672583190);
INSERT INTO users_seen_on_start(users) VALUES (6216571959193109764);
INSERT INTO users_seen_on_start(users) VALUES (8276320148745358024);
INSERT INTO users_seen_on_start(users) VALUES (4390033140354437765);
INSERT INTO users_seen_on_start(users) VALUES (4691956767605638049);
INSERT INTO users_seen_on_start(users) VALUES (8853050929187030210);
INSERT INTO users_seen_on_start(users) VALUES (4866380534293592106);
INSERT INTO users_seen_on_start(users) VALUES (9336123194114580988);
INSERT INTO users_seen_on_start(users) VALUES (9102157575556710064);
INSERT INTO users_seen_on_start(users) VALUES (5656668438554927436);
INSERT INTO users_seen_on_start(users) VALUES (1488391481235428518);
INSERT INTO users_seen_on_start(users) VALUES (2840931994944989396);
INSERT INTO users_seen_on_start(users) VALUES (2881922818148829205);
INSERT INTO users_seen_on_start(users) VALUES (15266979732129081227);
INSERT INTO users_seen_on_start(users) VALUES (17452034639473427980);
INSERT INTO users_seen_on_start(users) VALUES (16885946609916150102);
INSERT INTO users_seen_on_start(users) VALUES (11414196691107747488);
INSERT INTO users_seen_on_start(users) VALUES (11428367061145620067);
INSERT INTO users_seen_on_start(users) VALUES (11589939716097939663);
INSERT INTO users_seen_on_start(users) VALUES (9471966568512958356);
INSERT INTO users_seen_on_start(users) VALUES (10302973548806993195);
INSERT INTO users_seen_on_start(users) VALUES (11655856328192191298);
INSERT INTO users_seen_on_start(users) VALUES (13935768668138194306);
INSERT INTO users_seen_on_start(users) VALUES (12094331062677811830);
INSERT INTO users_seen_on_start(users) VALUES (10077917656361210181);
INSERT INTO users_seen_on_start(users) VALUES (12524832796889539656);
INSERT INTO users_seen_on_start(users) VALUES (12545063140779927439);
INSERT INTO users_seen_on_start(users) VALUES (12842029924433102779);
INSERT INTO users_seen_on_start(users) VALUES (642899976804427792);
INSERT INTO users_seen_on_start(users) VALUES (6445850403127955479);
INSERT INTO users_seen_on_start(users) VALUES (6564816699382875533);
INSERT INTO users_seen_on_start(users) VALUES (4991902095596735494);
INSERT INTO users_seen_on_start(users) VALUES (9240481697386039624);
INSERT INTO users_seen_on_start(users) VALUES (7462479064488338261);
INSERT INTO users_seen_on_start(users) VALUES (7954751513116206324);
INSERT INTO users_seen_on_start(users) VALUES (7916442053878140133);
INSERT INTO users_seen_on_start(users) VALUES (5943783673017806941);
INSERT INTO users_seen_on_start(users) VALUES (8019839094524452470);
INSERT INTO users_seen_on_start(users) VALUES (1325025305488677572);
INSERT INTO users_seen_on_start(users) VALUES (2757934917480873578);
INSERT INTO users_seen_on_start(users) VALUES (2953784252203011629);
INSERT INTO users_seen_on_start(users) VALUES (15663806630564163334);
INSERT INTO users_seen_on_start(users) VALUES (15822234287772625947);
INSERT INTO users_seen_on_start(users) VALUES (16086946171320332009);
INSERT INTO users_seen_on_start(users) VALUES (18326627563023885086);
INSERT INTO users_seen_on_start(users) VALUES (10177146105583960910);
INSERT INTO users_seen_on_start(users) VALUES (11536897925313534298);
INSERT INTO users_seen_on_start(users) VALUES (9521017502744452252);
INSERT INTO users_seen_on_start(users) VALUES (13846940074652198876);
INSERT INTO users_seen_on_start(users) VALUES (12072437921316824606);
INSERT INTO users_seen_on_start(users) VALUES (12130911952369094625);
INSERT INTO users_seen_on_start(users) VALUES (9944467373343765193);
INSERT INTO users_seen_on_start(users) VALUES (10130381296105130198);
INSERT INTO users_seen_on_start(users) VALUES (14503197259750222085);
INSERT INTO users_seen_on_start(users) VALUES (14514330935424697592);
INSERT INTO users_seen_on_start(users) VALUES (14700594671656063689);
INSERT INTO users_seen_on_start(users) VALUES (14735848995356133105);
INSERT INTO users_seen_on_start(users) VALUES (14880164794899465972);
INSERT INTO users_seen_on_start(users) VALUES (12844072088150040888);
INSERT INTO users_seen_on_start(users) VALUES (13244940156815171079);
INSERT INTO users_seen_on_start(users) VALUES (260647987138229776);
INSERT INTO users_seen_on_start(users) VALUES (4223941834652936903);
INSERT INTO users_seen_on_start(users) VALUES (7094513577121476923);
INSERT INTO users_seen_on_start(users) VALUES (9277783379267650134);
INSERT INTO users_seen_on_start(users) VALUES (5996874826262341487);
INSERT INTO users_seen_on_start(users) VALUES (6070125918500272373);
INSERT INTO users_seen_on_start(users) VALUES (1530161613058767114);
INSERT INTO users_seen_on_start(users) VALUES (3564084216977083409);
INSERT INTO users_seen_on_start(users) VALUES (2096791516261012274);
INSERT INTO users_seen_on_start(users) VALUES (17168509252900308876);
INSERT INTO users_seen_on_start(users) VALUES (17616873481220648376);
INSERT INTO users_seen_on_start(users) VALUES (17998058763193684336);
INSERT INTO users_seen_on_start(users) VALUES (16355698068697852664);
INSERT INTO users_seen_on_start(users) VALUES (18429432702234588790);
INSERT INTO users_seen_on_start(users) VALUES (11376613349708048591);
INSERT INTO users_seen_on_start(users) VALUES (11409024415220391296);
INSERT INTO users_seen_on_start(users) VALUES (11497048558563286896);
INSERT INTO users_seen_on_start(users) VALUES (11461240236069178124);
INSERT INTO users_seen_on_start(users) VALUES (10315048118076394592);
INSERT INTO users_seen_on_start(users) VALUES (10534194857330671443);
INSERT INTO users_seen_on_start(users) VALUES (13087206783728054302);

Sample data engaged_user_by_day:

CREATE TABLE engaged_user_by_day(
   users INTEGER  NOT NULL PRIMARY KEY 
);
INSERT INTO engaged_user_by_day(users) VALUES (6854940999573646134);
INSERT INTO engaged_user_by_day(users) VALUES (9215697890064860396);
INSERT INTO engaged_user_by_day(users) VALUES (5595285367064974856);
INSERT INTO engaged_user_by_day(users) VALUES (2054889847396937366);
INSERT INTO engaged_user_by_day(users) VALUES (2159837518531156200);
INSERT INTO engaged_user_by_day(users) VALUES (2297077047785095499);
INSERT INTO engaged_user_by_day(users) VALUES (15934479773952228986);
INSERT INTO engaged_user_by_day(users) VALUES (18388188973174323198);
INSERT INTO engaged_user_by_day(users) VALUES (13527051077114159514);
INSERT INTO engaged_user_by_day(users) VALUES (10527965347657532651);
INSERT INTO engaged_user_by_day(users) VALUES (10056509199904199853);
INSERT INTO engaged_user_by_day(users) VALUES (721447367663373337);
INSERT INTO engaged_user_by_day(users) VALUES (7418392997259835212);
INSERT INTO engaged_user_by_day(users) VALUES (1739158781654194388);
INSERT INTO engaged_user_by_day(users) VALUES (13485010633919602577);
INSERT INTO engaged_user_by_day(users) VALUES (11647513515368913077);
INSERT INTO engaged_user_by_day(users) VALUES (14723171573825482124);
INSERT INTO engaged_user_by_day(users) VALUES (316809625899342248);
INSERT INTO engaged_user_by_day(users) VALUES (736697877724685769);
INSERT INTO engaged_user_by_day(users) VALUES (1069762618672583190);
INSERT INTO engaged_user_by_day(users) VALUES (6216571959193109764);
INSERT INTO engaged_user_by_day(users) VALUES (8276320148745358024);
INSERT INTO engaged_user_by_day(users) VALUES (4390033140354437765);
INSERT INTO engaged_user_by_day(users) VALUES (4691956767605638049);
INSERT INTO engaged_user_by_day(users) VALUES (8853050929187030210);
INSERT INTO engaged_user_by_day(users) VALUES (4866380534293592106);
INSERT INTO engaged_user_by_day(users) VALUES (9336123194114580988);
INSERT INTO engaged_user_by_day(users) VALUES (9102157575556710064);
INSERT INTO engaged_user_by_day(users) VALUES (5656668438554927436);
INSERT INTO engaged_user_by_day(users) VALUES (1488391481235428518);
INSERT INTO engaged_user_by_day(users) VALUES (2840931994944989396);
INSERT INTO engaged_user_by_day(users) VALUES (2881922818148829205);
INSERT INTO engaged_user_by_day(users) VALUES (15266979732129081227);
INSERT INTO engaged_user_by_day(users) VALUES (17452034639473427980);
INSERT INTO engaged_user_by_day(users) VALUES (16885946609916150102);
INSERT INTO engaged_user_by_day(users) VALUES (11414196691107747488);
INSERT INTO engaged_user_by_day(users) VALUES (11428367061145620067);
INSERT INTO engaged_user_by_day(users) VALUES (11589939716097939663);
INSERT INTO engaged_user_by_day(users) VALUES (9471966568512958356);
INSERT INTO engaged_user_by_day(users) VALUES (10302973548806993195);
INSERT INTO engaged_user_by_day(users) VALUES (11655856328192191298);
INSERT INTO engaged_user_by_day(users) VALUES (13935768668138194306);
INSERT INTO engaged_user_by_day(users) VALUES (12094331062677811830);
INSERT INTO engaged_user_by_day(users) VALUES (10077917656361210181);
INSERT INTO engaged_user_by_day(users) VALUES (12524832796889539656);
INSERT INTO engaged_user_by_day(users) VALUES (12545063140779927439);
INSERT INTO engaged_user_by_day(users) VALUES (12842029924433102779);
INSERT INTO engaged_user_by_day(users) VALUES (642899976804427792);
INSERT INTO engaged_user_by_day(users) VALUES (6445850403127955479);
INSERT INTO engaged_user_by_day(users) VALUES (6564816699382875533);
INSERT INTO engaged_user_by_day(users) VALUES (4991902095596735494);
INSERT INTO engaged_user_by_day(users) VALUES (9240481697386039624);
INSERT INTO engaged_user_by_day(users) VALUES (7462479064488338261);

Rolling data in SQL can be implemented using a self-join. Please see an example of this below that finds data from the past 30 days relative to each day:

from atable a1
left join atable a2 on a1.custid=a2.custid 
and datediff(day,a1.dt,a2.dt)<=30 and datediff(day,a1.dt,a2.dt)>0 

There are some links that address a time window:

However, you use case sounds more like Custom Retention Cohorts Using BigQuery and Google Analytics . Even when the link describe a use case for Firebase, the queries explained in there address your main inquiry. For example, the following query address the retention across every three days:

WITH analytics_data AS (
  SELECT user_pseudo_id, event_timestamp, event_name, device.mobile_model_name,
    UNIX_MICROS(TIMESTAMP("2018-08-01 00:00:00", "-7:00")) AS start_day,
    3600*1000*1000*24*3 AS one_week_micros   -- Note the *3 at the end here
  FROM `firebase-public-project.analytics_153293282.events_*`
  WHERE _table_suffix BETWEEN '20180731' AND '20180829'
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM