I have very slow query which causes timeout error. I need all this data and sometimes even grouped by month.
Requirements are that I have to display data old 4 years. Results are displayed in a form of grid which has pagination so it must count all this results also which is performance intensive.
I was thinking about running crons which will calculate new rows so we don't have to use aggregated functions on flow, but what should I do with old data(100 millions of rows)?
Problematic Query - sometimes I need to group by and count results
SELECT
(SUM(onsite) / NULLIF(SUM(sessions),0)) as sumonsite,
SUM(onsite) as sum_onsite,
SUM(bounce_count) as bounce_count,
SUM(bounce_desktop) as bounce_pc,
SUM(bounce_mobile) as bounce_mobile,
SUM(bounce_tablet) as bounce_tablet,
(SUM(bounce_desktop) / NULLIF(SUM(uniques_desktop),0)) * 100 as bounce_pc,
(SUM(bounce_mobile) / NULLIF(SUM(uniques_mobile_phone),0)) * 100 as bounce_mobile,
(SUM(bounce_tablet) / NULLIF(SUM(uniques_tablet),0)) * 100 as bounce_tablet,
SUM(sessions) as sessions,
SUM(quality_3) as quality_3,
SUM(quality_2) as quality_2,
SUM(quality_1) as quality_1,
(SUM(amount)::float / NULLIF(SUM(uniques),0)) * 1000 as avg_cpc,
(SUM(bounce_count)::float / NULLIF(SUM(sessions),0)) * 100 as sumbounce,
(AVG(quality_3) / NULLIF(AVG(uniques),0)) * 100 as hq_quality,
(AVG(quality_2) / NULLIF(AVG(uniques),0)) * 100 as mq_quality,
(AVG(quality_1) / NULLIF(AVG(uniques),0)) * 100 as lq_quality,
SUM(cast(money_bonus as numeric(15,2))) as activity,
SUM(money_volume) as volume,
SUM(amount) as sumamount,
(SUM(clicks)::float / NULLIF(SUM(sessions),0)) as pages_per_visit,
SUM(add_par_1) as video_views,
SUM(add_par_3) as video_views_clicks,
((SUM(add_par_1)::decimal / NULLIF(SUM(sessions)::decimal,0))*100)::decimal(15,2) as sum_video_views,
100 * SUM(uniques_mobile_phone) ::FLOAT / SUM (uniques)::FLOAT AS uniques_mobile_phone,
100 * SUM(uniques_tablet)::FLOAT / SUM (uniques)::FLOAT AS uniques_tablet
FROM "aff_ref" "t" LEFT JOIN affiliate_domains ad ON ad.domain = t.referer AND ad.affiliate_id=t.affiliate_id WHERE ((DATE(day) >= '2013-12-14') AND (DATE(day) <= '2018-01-20'))
Table which is query
Size - 50 GB Number of rows - 94 917 680 Every day ~500K new rows are added
CREATE TABLE aff_ref
(
site_id INTEGER NOT NULL,
day DATE NOT NULL,
affiliate_id INTEGER DEFAULT 0 NOT NULL,
referer VARCHAR(250) NOT NULL,
uniques INTEGER DEFAULT 0 NOT NULL,
uniques_hq INTEGER DEFAULT 0 NOT NULL,
clicks INTEGER DEFAULT 0 NOT NULL,
uniques_bounce_count BIGINT DEFAULT (0)::bigint NOT NULL,
avg_clicks_all DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
bounce_count INTEGER DEFAULT 0 NOT NULL,
bounce DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
onsite BIGINT DEFAULT (0)::bigint NOT NULL,
avg_onsite DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
sessions INTEGER DEFAULT 1 NOT NULL,
quality_1 INTEGER DEFAULT 0 NOT NULL,
quality_2 INTEGER DEFAULT 0 NOT NULL,
quality_3 INTEGER DEFAULT 0 NOT NULL,
hq_1 INTEGER DEFAULT 0 NOT NULL,
hq_2 INTEGER DEFAULT 0 NOT NULL,
hq_3 INTEGER DEFAULT 0 NOT NULL,
add_par_1 INTEGER DEFAULT 0 NOT NULL,
add_par_2 INTEGER DEFAULT 0 NOT NULL,
add_par_3 INTEGER DEFAULT 0 NOT NULL,
add_par_4 INTEGER DEFAULT 0 NOT NULL,
add_par_5 BIGINT DEFAULT (0)::bigint NOT NULL,
avg_videoviews DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
avg_searches DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
money_prime DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
money_prime_low DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
money_prime_bounce DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
money_bonus DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
amount DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
amount_basic DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
onsite_coef DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
pageviews_coef DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
videoviews_coef DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
searches_coef DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
money_volume DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
cpc DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
recur_direct INTEGER DEFAULT 0 NOT NULL,
recur_search INTEGER DEFAULT 0 NOT NULL,
totally_fresh INTEGER DEFAULT 0 NOT NULL,
ntv_ctr REAL DEFAULT (0)::real NOT NULL,
top_ctr DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
videofooter_ctr DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
money_alt DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
amount_alt DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
money_prime_old DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
amount_old DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
amount_por DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
por_id BIGINT DEFAULT (0)::bigint NOT NULL,
amount_test INTEGER DEFAULT 0 NOT NULL,
js_time BIGINT DEFAULT (0)::bigint NOT NULL,
js_time_mouse BIGINT DEFAULT (0)::bigint NOT NULL,
js_exists_count BIGINT DEFAULT (0)::bigint NOT NULL,
js_not_exists_count BIGINT DEFAULT (0)::bigint NOT NULL,
vp_un BIGINT DEFAULT (0)::bigint NOT NULL,
vp_un_tr BIGINT DEFAULT (0)::bigint NOT NULL,
nb_normal_hq BIGINT DEFAULT (0)::bigint NOT NULL,
nb_normal_mq BIGINT DEFAULT (0)::bigint NOT NULL,
nb_normal_lq BIGINT DEFAULT (0)::bigint NOT NULL,
nb_embed_hq BIGINT DEFAULT (0)::bigint NOT NULL,
nb_embed_mq BIGINT DEFAULT (0)::bigint NOT NULL,
nb_embed_lq BIGINT DEFAULT (0)::bigint NOT NULL,
custom_cpc DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
amount_old_cpc DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
uniques_desktop INTEGER DEFAULT 0 NOT NULL,
uniques_mobile_phone INTEGER DEFAULT 0 NOT NULL,
uniques_tablet INTEGER DEFAULT 0 NOT NULL,
clicks_desktop INTEGER DEFAULT 0 NOT NULL,
clicks_mobile_phone INTEGER DEFAULT 0 NOT NULL,
clicks_tablet INTEGER DEFAULT 0 NOT NULL,
bounce_desktop DOUBLE PRECISION DEFAULT (0)::double precision,
bounce_tablet DOUBLE PRECISION DEFAULT (0)::double precision,
bounce_mobile DOUBLE PRECISION DEFAULT (0)::double precision,
country VARCHAR(2)
);
CREATE UNIQUE INDEX ref_sites_day_aff_stype_idx ON aff_ref (day, site_id, affiliate_id, referer, country);
CREATE UNIQUE INDEX ref_sites_day_aff_stype_idx ON aff_ref (day, site_id, affiliate_id, referer);
Table which is joined size - 11 mb count of rows - 107 278
CREATE TABLE domains
(
id INTEGER PRIMARY KEY NOT NULL,
affiliate_id INTEGER NOT NULL,
domain TEXT NOT NULL,
checked_date TIMESTAMP,
status SMALLINT DEFAULT 0 NOT NULL,
addedon_date TIMESTAMP(6),
suspended_date TIMESTAMP(6),
checked_via SMALLINT DEFAULT (1)::smallint NOT NULL,
is_redirect SMALLINT DEFAULT (0)::smallint,
compliance SMALLINT DEFAULT (-1) NOT NULL,
note VARCHAR(512),
CONSTRAINT domains_affiliate_id_fkey FOREIGN KEY (affiliate_id) REFERENCES affiliates (affiliate_id)
);
CREATE UNIQUE INDEX domains_affiliate_id_domain_key ON affiliate_domains (affiliate_id, domain);
Execution Plan:
Aggregate (cost=18169962.62..18169962.76 rows=1 width=123) (actual time=379233.584..379233.584 rows=1 loops=1)
-> Seq Scan on stats_aff_ref_sites t (cost=0.00..8203606.20 rows=94917680 width=123) (actual time=0.005..159746.597 rows=94917677 loops=1)
Filter: ((day >= '2013-12-14'::date) AND (day <= '2018-01-20'::date))
Planning time: 0.360 ms
Execution time: 379233.797 ms
I see here 2 simple solutions:
Never call the data you cannot use. Application have pages? Great - you can reduce your server load just by calling data for one page (ensure that where condition used indexed fiedls)
Are you need aggregated data? Great - just create table with aggregation results split by some property value (think like day/week/month). The last (current month data) will be called from main table on the fly.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.