简体   繁体   English

PostgreSQL临时文件使用率很高

[英]PostgreSQL temp-file usage very high

I am trying to migrate a system from Postgres 8.3 to 9.1, a query that runs every day on the 8.3 server takes ~10 min and <5gb of ram, the exact same query on the 9.1 db takes up all the memory(200GB+) on the default table spaces before dying(usually after a hour or 2), 我正在尝试将系统从Postgres 8.3迁移到9.1,每天在8.3服务器上运行的查询需要10分钟左右的时间,并且<5gb的ram,在9.1 db上完全相同的查询占用了所有内存(200GB以上)死前的默认表空间(通常是一小时或两小时后),

The Query 查询

INSERT INTO warehouse.stream_user_facts(
  time_id,
  country,
  content_manager_id,
  preview,
  stream_type_id,
  stream_source_key_id,
  user_is_registered,
  user_id,
  gender,
  age,
  stream_count,
  stream_costs,
  user_recency_days
)
SELECT
    facts.*,
    date - last_stream AS user_recency_days
  FROM main_facts AS facts
  LEFT JOIN warehouse.times USING (time_id)
  LEFT JOIN last_streams USING (user_id);

The view it uses 它使用的视图

  CREATE TEMP VIEW main_facts AS
    SELECT
        time_id,
        CASE WHEN stream_type_id = (SELECT stream_type_id FROM warehouse.stream_types WHERE name = 'SUBSCRIPTION')
             THEN subscription_country
             ELSE track_streams_extra.country
        END AS report_country,
        content_manager_id,
        preview,
        stream_type_id,
        stream_source_key_id,
        COALESCE(register_date <= reporting_date, false) AS user_is_registered,
        user_id,
        CASE WHEN register_date <= reporting_date
             THEN gender
        END AS gender_at_stream,
        CASE WHEN register_date <= reporting_date
             THEN EXTRACT(YEAR FROM age(reporting_date, dob))
        END AS age_at_stream,
        COUNT(1) AS stream_count,
        nullable_sum(cost) AS stream_costs
      FROM warehouse.track_streams_extra
      LEFT JOIN users USING (user_id)
      LEFT JOIN user_registration USING (user_id)
      JOIN time_period ON (reporting_date >= start AND reporting_date < past_end)
      GROUP BY time_id, report_country, content_manager_id, preview, stream_type_id, stream_source_key_id, user_is_registered, user_id, gender_at_stream, age_at_stream;

I cant see anything wrong with it, but as I said it works on 8.3 but not on 9.1 is there some fundamental changes that may have made that happen. 我看不到有什么问题,但是正如我说的那样,它在8.3上有效,但在9.1上却没有,那么可能发生了一些根本性的变化。

EDIT:: ADDED EXPLAIN * EDIT:: ADDED EXPLAIN VERSION * 编辑::添加了说明 * 编辑::添加了说明版本 *

Explain on 9.1, cant get explain analyse as it crashes and never returns, will post 8.3 explain analyse in moment 在9.1上进行说明,当它崩溃且永不返回时,无法得到解释分析,将在8.3发布说明分析

Insert on stream_user_facts  (cost=148914916373.07..321324730521.96 rows=1362701321853 width=152)
   ->  Hash Left Join  (cost=148914916373.07..321324730521.96 rows=1362701321853 width=152)
         Hash Cond: (track_streams.user_id = last_streams.user_id)
         ->  Hash Left Join  (cost=148914916319.42..246024946085.50 rows=140484672356 width=148)
               Hash Cond: (time_period.time_id = times.time_id)
               ->  GroupAggregate  (cost=148914916300.90..242688435098.53 rows=140484672356 width=95)
                     InitPlan 1 (returns $0)
                       ->  Seq Scan on stream_types  (cost=0.00..1.06 rows=1 width=2)
                             Filter: (name = 'SUBSCRIPTION'::text)
                     ->  Sort  (cost=148914916299.84..149266127980.73 rows=140484672356 width=95)
                           Sort Key: time_period.time_id, (CASE WHEN (CASE WHEN ((track_streams.stream_source_key_id = ANY ('{8,16}'::integer[])) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= '2012-01-01 00:00:00'::timestamp without time zone)) THEN 1 WHEN ((track_streams.flags & 2::bigint) <> 0) THEN 2 WHEN (track_streams.play_source = ANY ('{5,6,7,8}'::integer[])) THEN 3 WHEN CASE WHEN (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= '2011-06-01 00:00:00'::timestamp without time zone) THEN (COALESCE(track_streams.playlist_id, 0::bigint) > 0) ELSE (track_streams.playlist_id IS NOT NULL) END THEN 4 ELSE 5 END = $0) THEN (archive.subscriptions.country)::character varying ELSE track_streams.country END), t.user_id, track_streams.preview, (CASE WHEN ((track_streams.stream_source_key_id = ANY ('{8,16}'::integer[])) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= '2012-01-01 00:00:00'::timestamp without time zone)) THEN 1 WHEN ((track_streams.flags & 2::bigint) <> 0) THEN 2 WHEN (track_streams.play_source = ANY ('{5,6,7,8}'::integer[])) THEN 3 WHEN CASE WHEN (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= '2011-06-01 00:00:00'::timestamp without time zone) THEN (COALESCE(track_streams.playlist_id, 0::bigint) > 0) ELSE (track_streams.playlist_id IS NOT NULL) END THEN 4 ELSE 5 END), track_streams.stream_source_key_id, (COALESCE((user_registration.register_date <= CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END), false)), track_streams.user_id, (CASE WHEN (user_registration.register_date <= CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END) THEN users.gender ELSE NULL::character varying END), (CASE WHEN (user_registration.register_date <= CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END) THEN date_part('year'::text, age(CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END, (users.dob)::timestamp without time zone)) ELSE NULL::double precision END)
                           ->  Nested Loop  (cost=40230598.46..58079790334.65 rows=140484672356 width=95)
                                 Join Filter: ((CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= time_period.start) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END < time_period.past_end))
                                 ->  Hash Left Join  (cost=40230598.46..481074638.44 rows=775682240 width=87)
                                       Hash Cond: (track_streams.user_id = user_registration.user_id)
                                       ->  Hash Left Join  (cost=40173344.70..448709880.28 rows=775682240 width=79)
                                             Hash Cond: (track_streams.user_id = users.user_id)
                                             ->  Hash Left Join  (cost=37218851.64..347178159.42 rows=775682240 width=73)
                                                   Hash Cond: ((t.user_id = c.content_manager_id) AND ((track_streams.country)::text = (c.country)::text))
                                                   Join Filter: ((CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= c.first_valid_day) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END < c.one_past_last_valid_day))
                                                   ->  Hash Left Join  (cost=37218834.16..279295937.13 rows=775682240 width=67)
                                                         Hash Cond: (t.user_id = fallback.content_manager_id)
                                                         Join Filter: ((CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= fallback.first_valid_day) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END < fallback.one_past_last_valid_day))
                                                         ->  Hash Join  (cost=37218817.98..155389716.85 rows=775682240 width=61)
                                                               Hash Cond: (track_streams.track_id = t.track_id)
                                                               ->  Hash Right Join  (cost=36223425.43..114392317.09 rows=775682240 width=61)
                                                                     Hash Cond: (archive.subscriptions.user_id = track_streams.user_id)
                                                                     Join Filter: ((CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END >= archive.subscriptions.created_at) AND (CASE WHEN (track_streams.ingestion_date IS NULL) THEN track_streams.date WHEN (date_trunc('day'::text, track_streams.date) = track_streams.ingestion_date) THEN track_streams.date ELSE (track_streams.ingestion_date)::timestamp without time zone END < COALESCE(archive.subscriptions.created_at, 'infinity'::timestamp without time zone)))
                                                                     ->  Merge Left Join  (cost=28078.03..28672.47 rows=26353 width=27)
                                                                           Merge Cond: ((archive.subscriptions.user_id = archive.subscriptions.user_id) AND ((((count(b.subscription_id)) + 1)) = (count(b.subscription_id))))
                                                                           ->  Sort  (cost=14110.51..14176.40 rows=26353 width=27)
                                                                                 Sort Key: archive.subscriptions.user_id, (((count(b.subscription_id)) + 1))
                                                                                 ->  Hash Join  (cost=9851.19..11541.95 rows=26353 width=27)
                                                                                       Hash Cond: (a.subscription_id = archive.subscriptions.subscription_id)
                                                                                       ->  GroupAggregate  (cost=8419.24..8880.42 rows=26353 width=16)
                                                                                             ->  Sort  (cost=8419.24..8485.13 rows=26353 width=16)
                                                                                                   Sort Key: a.subscription_id

 ->  Hash Left Join  (cost=1405.94..6032.69 rows=26353 width=16)
                                                                                                         Hash Cond: (a.user_id = b.user_id)
                                                                                                         Join Filter: (b.created_at < a.created_at)
                                                                                                         ->  Seq Scan on subscriptions a  (cost=0.00..921.53 rows=26353 width=24)
                                                                                                         ->  Hash  (cost=921.53..921.53 rows=26353 width=24)
                                                                                                               ->  Seq Scan on subscriptions b  (cost=0.00..921.53 rows=26353 width=24)
                                                                                       ->  Hash  (cost=921.53..921.53 rows=26353 width=27)
                                                                                             ->  Seq Scan on subscriptions  (cost=0.00..921.53 rows=26353 width=27)
                                                                           ->  Materialize  (cost=13967.51..14099.28 rows=26353 width=24)
                                                                                 ->  Sort  (cost=13967.51..14033.40 rows=26353 width=24)
                                                                                       Sort Key: archive.subscriptions.user_id, (count(b.subscription_id))
                                                                                       ->  Hash Join  (cost=9825.19..11489.95 rows=26353 width=24)
                                                                                             Hash Cond: (a.subscription_id = archive.subscriptions.subscription_id)
                                                                                             ->  GroupAggregate  (cost=8419.24..8880.42 rows=26353 width=16)
                                                                                                   ->  Sort  (cost=8419.24..8485.13 rows=26353 width=16)
                                                                                                         Sort Key: a.subscription_id
                                                                                                         ->  Hash Left Join  (cost=1405.94..6032.69 rows=26353 width=16)
                                                                                                               Hash Cond: (a.user_id = b.user_id)
                                                                                                               Join Filter: (b.created_at < a.created_at)
                                                                                                               ->  Seq Scan on subscriptions a  (cost=0.00..921.53 rows=26353 width=24)
                                                                                                               ->  Hash  (cost=921.53..921.53 rows=26353 width=24)
                                                                                                                     ->  Seq Scan on subscriptions b  (cost=0.00..921.53 rows=26353 width=24)
                                                                                             ->  Hash  (cost=921.53..921.53 rows=26353 width=24)
                                                                                                   ->  Seq Scan on subscriptions  (cost=0.00..921.53 rows=26353 width=24)
                                                                     ->  Hash  (cost=18166794.40..18166794.40 rows=775682240 width=58)
                                                                           ->  Seq Scan on track_streams  (cost=0.00..18166794.40 rows=775682240 width=58)
                                                               ->  Hash  (cost=758689.47..758689.47 rows=13617047 width=16)
                                                                     ->  Seq Scan on tracks t  (cost=0.00..758689.47 rows=13617047 width=16)
                                                         ->  Hash  (cost=9.99..9.99 rows=495 width=30)
                                                               ->  Seq Scan on streaming_costs fallback  (cost=0.00..9.99 rows=495 width=30)
                                                                     Filter: (country IS NULL)
                                                   ->  Hash  (cost=9.99..9.99 rows=499 width=33)
                                                         ->  Seq Scan on streaming_costs c  (cost=0.00..9.99 rows=499 width=33)
                                             ->  Hash  (cost=1728633.36..1728633.36 rows=70521336 width=14)
                                                   ->  Seq Scan on users  (cost=0.00..1728633.36 rows=70521336 width=14)
                                       ->  Hash  (cost=30163.34..30163.34 rows=1558434 width=16)
                                             ->  Seq Scan on user_registration  (cost=0.00..30163.34 rows=1558434 width=16)
                                 ->  Materialize  (cost=0.00..34.45 rows=1630 width=20)
                                       ->  Seq Scan on time_period  (cost=0.00..26.30 rows=1630 width=20)
               ->  Hash  (cost=10.45..10.45 rows=645 width=12)
                     ->  Seq Scan on times  (cost=0.00..10.45 rows=645 width=12)
         ->  Hash  (cost=29.40..29.40 rows=1940 width=12)
               ->  Seq Scan on last_streams  (cost=0.00..29.40 rows=1940 width=12)
(80 rows)

EDIT CONFIG 8.3 编辑配置 8.3

         name            |                                        current_setting                                         
---------------------------+------------------------------------------------------------------------------------------------
 version                   | PostgreSQL 8.3.8 on x86_64-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2
 autovacuum                | off
 checkpoint_segments       | 6
 client_encoding           | UTF8
 constraint_exclusion      | on
 default_statistics_target | 100
 effective_cache_size      | 512MB
 external_pid_file         | /var/run/postgresql/8.3-main.pid
 lc_collate                | en_GB.UTF-8
 lc_ctype                  | en_GB.UTF-8
 listen_addresses          | *
 log_line_prefix           | %t 
 log_lock_waits            | on
 maintenance_work_mem      | 2GB
 max_connections           | 50
 max_fsm_pages             | 3000000
 max_stack_depth           | 2MB
 port                      | 5432
 search_path               | "$user", archive, clone, live, public
 server_encoding           | UTF8
 shared_buffers            | 1536MB
 ssl                       | on
 TimeZone                  | GB
 unix_socket_directory     | /var/run/postgresql
 wal_buffers               | 8MB
 work_mem                  | 512MB

9.1 9.1

           name             |                                            current_setting                                            
------------------------------+-------------------------------------------------------------------------------------------------------
 version                      | PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real (Debian 4.4.5-8) 4.4.5, 64-bit
 archive_command              | test ! -f /storage/wal/data/%f && cp %p /storage/wal/data/%f
 archive_mode                 | on
 archive_timeout              | 5min
 autovacuum_freeze_max_age    | 1000000000
 autovacuum_vacuum_cost_delay | 20ms
 checkpoint_completion_target | 0.5
 checkpoint_segments          | 64
 checkpoint_timeout           | 1min
 checkpoint_warning           | 0
 client_encoding              | UTF8
 effective_cache_size         | 18GB
 effective_io_concurrency     | 6
 external_pid_file            | /var/run/postgresql/9.1-main.pid
 lc_collate                   | en_GB.UTF8
 lc_ctype                     | en_GB.UTF8
 listen_addresses             | 127.0.0.1,10.10.10.2,10.10.10.1,10.0.0.225
 log_destination              | stderr
 log_directory                | /var/log/postgresql
 log_filename                 | postgresql-%Y-%m-%d.log
 log_line_prefix              | %m [%u@%r:%d] 
 log_min_duration_statement   | 100ms
 log_min_error_statement      | info
 log_min_messages             | info
 logging_collector            | on
 maintenance_work_mem         | 2GB
 max_connections              | 100
 max_stack_depth              | 2MB
 max_wal_senders              | 5
 port                         | 5432
 search_path                  | "$user", archive, clone, live
 server_encoding              | UTF8

first that's way too much work_mem unless you have vast amounts of RAM. 首先,除非您有大量的RAM,否则太多的work_mem。 You are allowing it to allocate 0.5GB of RAm per join or sort. 您允许它为每个联接或排序分配0.5GB的RAm。 You will want to reduce that quite a bit. 您将需要减少很多。 That may be a big part of your problem. 那可能是您问题的很大一部分。 Cut that down quite a bit and you may find that solves your problem. 将其削减很多,您可能会发现可以解决您的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM