如何更快地搜索一个有 8000 万条记录的表？

Question

I have a table with about 80 million records, I want to find all the activities of lists and workspaces that a user has access to.我有一个包含大约 8000 万条记录的表，我想查找用户有权访问的列表和工作区的所有活动。 So first, I get the ids of the lists and workspaces and then I run the following query:因此，首先，我获取列表和工作区的 ID，然后运行以下查询：

select *, COALESCE("origin_created_at", "created_at") AS "created_at",
  COALESCE("updated_at", "origin_updated_at") AS "updated_at" 
from "activities" 
where ("listId" in (310,214088,219,220,271,222,28434,36046,43233,38236,
  1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,
  395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,
  587679,841589,722320,551,170392,421035,197071,632736,632742,632755,
  632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,
  58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,
  158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,
  643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,
  886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,
  226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,
  323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,
  267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,
  74018) 
 or "workspaceId" in (137, 81, 111, 424284, 425935, 430658, 84, 163840, 
  3, 4, 281105, 57, 64642, 96660, 38739, 273574, 295312, 79, 213, 
  240478, 424760, 65, 36989)) 
and (("isBulk" = false or "activities"."type" = 0) 
       and "activities"."deprecated_at" is null) 
order by COALESCE("origin_created_at", "created_at") DESC, "id" desc
limit 40;

and this the execution plan这是执行计划

 Limit  (cost=2446886.55..2446886.65 rows=40 width=1002) (actual time=44452.393..44452.418 rows=40 loops=1)
   ->  Sort  (cost=2446886.55..2449439.67 rows=1021250 width=1002) (actual time=44452.391..44452.401 rows=40 loops=1)
         Sort Key: (COALESCE(origin_created_at, created_at)) DESC, id DESC
         Sort Method: top-N heapsort  Memory: 37kB
         ->  Bitmap Heap Scan on activities  (cost=37546.04..2414605.20 rows=1021250 width=1002) (actual time=1043.663..43916.385 rows=568891 loops=1)
               Recheck Cond: (("listId" = ANY ('{310,214088,219,220,271,222,28434,36046,43233,38236,1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,587679,841589,722320,551,170392,421035,197071,632736,632742,632755,632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,74018}'::integer[])) OR ("workspaceId" = ANY ('{137,81,111,424284,425935,430658,84,163840,3,4,281105,57,64642,96660,38739,273574,295312,79,213,240478,424760,65,36989}'::integer[])))
               Rows Removed by Index Recheck: 9072392
               Filter: ((deprecated_at IS NULL) AND ((NOT "isBulk") OR (type = 0)))
               Rows Removed by Filter: 113630
               Heap Blocks: exact=41259 lossy=271838
               ->  BitmapOr  (cost=37546.04..37546.04 rows=1350377 width=0) (actual time=1032.769..1032.769 rows=0 loops=1)
                     ->  Bitmap Index Scan on activities_list_id_index  (cost=0.00..17333.10 rows=617933 width=0) (actual time=118.412..118.412 rows=507019 loops=1)
                           Index Cond: ("listId" = ANY ('{310,214088,219,220,271,222,28434,36046,43233,38236,1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,587679,841589,722320,551,170392,421035,197071,632736,632742,632755,632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,74018}'::integer[]))
                     ->  Bitmap Index Scan on activities_workspace_id_index  (cost=0.00..19702.32 rows=732444 width=0) (actual time=914.355..914.355 rows=682628 loops=1)
                           Index Cond: ("workspaceId" = ANY ('{137,81,111,424284,425935,430658,84,163840,3,4,281105,57,64642,96660,38739,273574,295312,79,213,240478,424760,65,36989}'::integer[]))
 Planning time: 2.882 ms
 Execution time: 44452.871 ms
(17 rows)

As stated in the plan PostgreSQL uses "Bitmap Heap Scan" to scan the activities which makes the query slower although both columns are indexed.正如计划中所述，PostgreSQL 使用“位图堆扫描”来扫描活动，尽管这两个列都已编入索引，但查询速度会变慢。 In total, there are 4 indices on the table, one for each of the following columns: type, listId, workspaceId, organizationId.表上总共有 4 个索引，以下列各一个：type、listId、workspaceId、organizationId。

How can I make the query faster?如何使查询更快？ Or is there a better way to rewrite the query?还是有更好的方法来重写查询？

Answer 1

As stated in the plan PostgreSQL uses "Bitmap Heap Scan" to scan the activities which makes the query slower although both columns are indexed.正如计划中所述，PostgreSQL 使用“位图堆扫描”来扫描活动，尽管这两个列都已编入索引，但查询速度会变慢。

It is using both of those indexes.它正在使用这两个索引。 The Bitmap used to guide the Heap Scan is based on them, via the BitmapOr.用于指导堆扫描的 Bitmap 是基于它们，通过 BitmapOr。

One possible culprit is here:一个可能的罪魁祸首在这里：

Rows Removed by Index Recheck: 9072392
Heap Blocks: exact=41259 lossy=271838

Increase work_mem until the lossy blocks go away.增加 work_mem 直到有损块 go 消失。 But if the problem is the time to read blocks from disk, that probably won't help.但如果问题是从磁盘读取块的时间，那可能无济于事。

如何更快地搜索一个有 8000 万条记录的表？

问题描述

1 个解决方案

解决方案1
0 2020-06-18 16:43:34

如何更快地搜索一个有 8000 万条记录的表？

问题描述

1 个解决方案

解决方案1 0 2020-06-18 16:43:34

解决方案1
0 2020-06-18 16:43:34