简体   繁体   中英

Querying large table in postgresql

I have very slow query which causes timeout error. I need all this data and sometimes even grouped by month.

Requirements are that I have to display data old 4 years. Results are displayed in a form of grid which has pagination so it must count all this results also which is performance intensive.

I was thinking about running crons which will calculate new rows so we don't have to use aggregated functions on flow, but what should I do with old data(100 millions of rows)?

Problematic Query - sometimes I need to group by and count results

SELECT
(SUM(onsite) / NULLIF(SUM(sessions),0)) as sumonsite,
SUM(onsite) as sum_onsite,
SUM(bounce_count) as bounce_count,
SUM(bounce_desktop) as bounce_pc,
SUM(bounce_mobile) as bounce_mobile,
SUM(bounce_tablet) as bounce_tablet,
(SUM(bounce_desktop) / NULLIF(SUM(uniques_desktop),0)) * 100 as bounce_pc,
(SUM(bounce_mobile) / NULLIF(SUM(uniques_mobile_phone),0)) * 100 as bounce_mobile,
(SUM(bounce_tablet) / NULLIF(SUM(uniques_tablet),0)) * 100 as bounce_tablet,
SUM(sessions) as sessions,
SUM(quality_3) as quality_3,
SUM(quality_2) as quality_2,
SUM(quality_1) as quality_1,
(SUM(amount)::float / NULLIF(SUM(uniques),0)) * 1000 as avg_cpc,
(SUM(bounce_count)::float / NULLIF(SUM(sessions),0)) * 100 as sumbounce,
(AVG(quality_3) / NULLIF(AVG(uniques),0)) * 100 as hq_quality,
(AVG(quality_2) / NULLIF(AVG(uniques),0)) * 100 as mq_quality,
(AVG(quality_1) / NULLIF(AVG(uniques),0)) * 100 as lq_quality,
SUM(cast(money_bonus as numeric(15,2))) as activity,
SUM(money_volume) as volume,
SUM(amount) as sumamount,
(SUM(clicks)::float / NULLIF(SUM(sessions),0)) as pages_per_visit,
SUM(add_par_1) as video_views,
SUM(add_par_3) as video_views_clicks,
((SUM(add_par_1)::decimal / NULLIF(SUM(sessions)::decimal,0))*100)::decimal(15,2) as sum_video_views,
100 * SUM(uniques_mobile_phone) ::FLOAT / SUM (uniques)::FLOAT AS uniques_mobile_phone,
100 * SUM(uniques_tablet)::FLOAT / SUM (uniques)::FLOAT AS uniques_tablet
FROM "aff_ref" "t" LEFT JOIN affiliate_domains ad ON ad.domain = t.referer AND ad.affiliate_id=t.affiliate_id WHERE ((DATE(day) >= '2013-12-14') AND (DATE(day) <= '2018-01-20'))

Table which is query

Size - 50 GB Number of rows - 94 917 680 Every day ~500K new rows are added

    CREATE TABLE aff_ref
(
    site_id INTEGER NOT NULL,
    day DATE NOT NULL,
    affiliate_id INTEGER DEFAULT 0 NOT NULL,
    referer VARCHAR(250) NOT NULL,
    uniques INTEGER DEFAULT 0 NOT NULL,
    uniques_hq INTEGER DEFAULT 0 NOT NULL,
    clicks INTEGER DEFAULT 0 NOT NULL,
    uniques_bounce_count BIGINT DEFAULT (0)::bigint NOT NULL,
    avg_clicks_all DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    bounce_count INTEGER DEFAULT 0 NOT NULL,
    bounce DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    onsite BIGINT DEFAULT (0)::bigint NOT NULL,
    avg_onsite DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    sessions INTEGER DEFAULT 1 NOT NULL,
    quality_1 INTEGER DEFAULT 0 NOT NULL,
    quality_2 INTEGER DEFAULT 0 NOT NULL,
    quality_3 INTEGER DEFAULT 0 NOT NULL,
    hq_1 INTEGER DEFAULT 0 NOT NULL,
    hq_2 INTEGER DEFAULT 0 NOT NULL,
    hq_3 INTEGER DEFAULT 0 NOT NULL,
    add_par_1 INTEGER DEFAULT 0 NOT NULL,
    add_par_2 INTEGER DEFAULT 0 NOT NULL,
    add_par_3 INTEGER DEFAULT 0 NOT NULL,
    add_par_4 INTEGER DEFAULT 0 NOT NULL,
    add_par_5 BIGINT DEFAULT (0)::bigint NOT NULL,
    avg_videoviews DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    avg_searches DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    money_prime DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    money_prime_low DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    money_prime_bounce DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    money_bonus DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    amount DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    amount_basic DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    onsite_coef DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    pageviews_coef DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    videoviews_coef DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    searches_coef DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    money_volume DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    cpc DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    recur_direct INTEGER DEFAULT 0 NOT NULL,
    recur_search INTEGER DEFAULT 0 NOT NULL,
    totally_fresh INTEGER DEFAULT 0 NOT NULL,
    ntv_ctr REAL DEFAULT (0)::real NOT NULL,
    top_ctr DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    videofooter_ctr DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    money_alt DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    amount_alt DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    money_prime_old DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    amount_old DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    amount_por DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    por_id BIGINT DEFAULT (0)::bigint NOT NULL,
    amount_test INTEGER DEFAULT 0 NOT NULL,
    js_time BIGINT DEFAULT (0)::bigint NOT NULL,
    js_time_mouse BIGINT DEFAULT (0)::bigint NOT NULL,
    js_exists_count BIGINT DEFAULT (0)::bigint NOT NULL,
    js_not_exists_count BIGINT DEFAULT (0)::bigint NOT NULL,
    vp_un BIGINT DEFAULT (0)::bigint NOT NULL,
    vp_un_tr BIGINT DEFAULT (0)::bigint NOT NULL,
    nb_normal_hq BIGINT DEFAULT (0)::bigint NOT NULL,
    nb_normal_mq BIGINT DEFAULT (0)::bigint NOT NULL,
    nb_normal_lq BIGINT DEFAULT (0)::bigint NOT NULL,
    nb_embed_hq BIGINT DEFAULT (0)::bigint NOT NULL,
    nb_embed_mq BIGINT DEFAULT (0)::bigint NOT NULL,
    nb_embed_lq BIGINT DEFAULT (0)::bigint NOT NULL,
    custom_cpc DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    amount_old_cpc DOUBLE PRECISION DEFAULT (0)::double precision NOT NULL,
    uniques_desktop INTEGER DEFAULT 0 NOT NULL,
    uniques_mobile_phone INTEGER DEFAULT 0 NOT NULL,
    uniques_tablet INTEGER DEFAULT 0 NOT NULL,
    clicks_desktop INTEGER DEFAULT 0 NOT NULL,
    clicks_mobile_phone INTEGER DEFAULT 0 NOT NULL,
    clicks_tablet INTEGER DEFAULT 0 NOT NULL,
    bounce_desktop DOUBLE PRECISION DEFAULT (0)::double precision,
    bounce_tablet DOUBLE PRECISION DEFAULT (0)::double precision,
    bounce_mobile DOUBLE PRECISION DEFAULT (0)::double precision,
    country VARCHAR(2)
);
CREATE UNIQUE INDEX ref_sites_day_aff_stype_idx ON aff_ref (day, site_id, affiliate_id, referer, country);
CREATE UNIQUE INDEX ref_sites_day_aff_stype_idx ON aff_ref (day, site_id, affiliate_id, referer);

Table which is joined size - 11 mb count of rows - 107 278

    CREATE TABLE domains
(
    id INTEGER PRIMARY KEY NOT NULL,
    affiliate_id INTEGER NOT NULL,
    domain TEXT NOT NULL,
    checked_date TIMESTAMP,
    status SMALLINT DEFAULT 0 NOT NULL,
    addedon_date TIMESTAMP(6),
    suspended_date TIMESTAMP(6),
    checked_via SMALLINT DEFAULT (1)::smallint NOT NULL,
    is_redirect SMALLINT DEFAULT (0)::smallint,
    compliance SMALLINT DEFAULT (-1) NOT NULL,
    note VARCHAR(512),
    CONSTRAINT domains_affiliate_id_fkey FOREIGN KEY (affiliate_id) REFERENCES affiliates (affiliate_id)
);


CREATE UNIQUE INDEX domains_affiliate_id_domain_key ON affiliate_domains (affiliate_id, domain);

Execution Plan:

    Aggregate  (cost=18169962.62..18169962.76 rows=1 width=123) (actual time=379233.584..379233.584 rows=1 loops=1)
  ->  Seq Scan on stats_aff_ref_sites t  (cost=0.00..8203606.20 rows=94917680 width=123) (actual time=0.005..159746.597 rows=94917677 loops=1)
        Filter: ((day >= '2013-12-14'::date) AND (day <= '2018-01-20'::date))
Planning time: 0.360 ms
Execution time: 379233.797 ms

I see here 2 simple solutions:

  • Never call the data you cannot use. Application have pages? Great - you can reduce your server load just by calling data for one page (ensure that where condition used indexed fiedls)

  • Are you need aggregated data? Great - just create table with aggregation results split by some property value (think like day/week/month). The last (current month data) will be called from main table on the fly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM