简体   繁体   English

如何更快地搜索一个有 8000 万条记录的表?

[英]How to search a table with 80 million records faster?

I have a table with about 80 million records, I want to find all the activities of lists and workspaces that a user has access to.我有一个包含大约 8000 万条记录的表,我想查找用户有权访问的列表和工作区的所有活动。 So first, I get the ids of the lists and workspaces and then I run the following query:因此,首先,我获取列表和工作区的 ID,然后运行以下查询:

select *, COALESCE("origin_created_at", "created_at") AS "created_at",
  COALESCE("updated_at", "origin_updated_at") AS "updated_at" 
from "activities" 
where ("listId" in (310,214088,219,220,271,222,28434,36046,43233,38236,
  1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,
  395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,
  587679,841589,722320,551,170392,421035,197071,632736,632742,632755,
  632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,
  58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,
  158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,
  643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,
  886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,
  226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,
  323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,
  267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,
  74018) 
 or "workspaceId" in (137, 81, 111, 424284, 425935, 430658, 84, 163840, 
  3, 4, 281105, 57, 64642, 96660, 38739, 273574, 295312, 79, 213, 
  240478, 424760, 65, 36989)) 
and (("isBulk" = false or "activities"."type" = 0) 
       and "activities"."deprecated_at" is null) 
order by COALESCE("origin_created_at", "created_at") DESC, "id" desc
limit 40;

and this the execution plan这是执行计划

 Limit  (cost=2446886.55..2446886.65 rows=40 width=1002) (actual time=44452.393..44452.418 rows=40 loops=1)
   ->  Sort  (cost=2446886.55..2449439.67 rows=1021250 width=1002) (actual time=44452.391..44452.401 rows=40 loops=1)
         Sort Key: (COALESCE(origin_created_at, created_at)) DESC, id DESC
         Sort Method: top-N heapsort  Memory: 37kB
         ->  Bitmap Heap Scan on activities  (cost=37546.04..2414605.20 rows=1021250 width=1002) (actual time=1043.663..43916.385 rows=568891 loops=1)
               Recheck Cond: (("listId" = ANY ('{310,214088,219,220,271,222,28434,36046,43233,38236,1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,587679,841589,722320,551,170392,421035,197071,632736,632742,632755,632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,74018}'::integer[])) OR ("workspaceId" = ANY ('{137,81,111,424284,425935,430658,84,163840,3,4,281105,57,64642,96660,38739,273574,295312,79,213,240478,424760,65,36989}'::integer[])))
               Rows Removed by Index Recheck: 9072392
               Filter: ((deprecated_at IS NULL) AND ((NOT "isBulk") OR (type = 0)))
               Rows Removed by Filter: 113630
               Heap Blocks: exact=41259 lossy=271838
               ->  BitmapOr  (cost=37546.04..37546.04 rows=1350377 width=0) (actual time=1032.769..1032.769 rows=0 loops=1)
                     ->  Bitmap Index Scan on activities_list_id_index  (cost=0.00..17333.10 rows=617933 width=0) (actual time=118.412..118.412 rows=507019 loops=1)
                           Index Cond: ("listId" = ANY ('{310,214088,219,220,271,222,28434,36046,43233,38236,1014787,1017501,1065915,162,399844,399845,395721,824491,400,405,408,395873,36,188,178,120,461,1104,27341,27356,83329,29271,158639,482197,587679,841589,722320,551,170392,421035,197071,632736,632742,632755,632758,673517,155,1231,2691,2695,9092,13783,24273,45765,57909,57938,58323,291171,324525,496,5369,54099,54576,98818,569319,1434677,279,158821,127,158197,50301,761351,261,438101,159009,643013,158273,58557,643867,356252,631758,299145,131,179,156,661,241,260,281,245,438106,886,101,72915,90857,144564,166270,230,178981,195046,208561,382159,226599,297964,298318,89043,193559,326394,313589,450540,541359,620442,323458,628644,643014,261008,650332,689117,847849,672369,932660,382843,267000,826590,642775,400339,642875,1282788,1341992,1411789,1515479,74018}'::integer[]))
                     ->  Bitmap Index Scan on activities_workspace_id_index  (cost=0.00..19702.32 rows=732444 width=0) (actual time=914.355..914.355 rows=682628 loops=1)
                           Index Cond: ("workspaceId" = ANY ('{137,81,111,424284,425935,430658,84,163840,3,4,281105,57,64642,96660,38739,273574,295312,79,213,240478,424760,65,36989}'::integer[]))
 Planning time: 2.882 ms
 Execution time: 44452.871 ms
(17 rows)

As stated in the plan PostgreSQL uses "Bitmap Heap Scan" to scan the activities which makes the query slower although both columns are indexed.正如计划中所述,PostgreSQL 使用“位图堆扫描”来扫描活动,尽管这两个列都已编入索引,但查询速度会变慢。 In total, there are 4 indices on the table, one for each of the following columns: type, listId, workspaceId, organizationId.表上总共有 4 个索引,以下列各一个:type、listId、workspaceId、organizationId。

How can I make the query faster?如何使查询更快? Or is there a better way to rewrite the query?还是有更好的方法来重写查询?

As stated in the plan PostgreSQL uses "Bitmap Heap Scan" to scan the activities which makes the query slower although both columns are indexed.正如计划中所述,PostgreSQL 使用“位图堆扫描”来扫描活动,尽管这两个列都已编入索引,但查询速度会变慢。

It is using both of those indexes.它正在使用这两个索引。 The Bitmap used to guide the Heap Scan is based on them, via the BitmapOr.用于指导堆扫描的 Bitmap 是基于它们,通过 BitmapOr。

One possible culprit is here:一个可能的罪魁祸首在这里:

Rows Removed by Index Recheck: 9072392
Heap Blocks: exact=41259 lossy=271838

Increase work_mem until the lossy blocks go away.增加 work_mem 直到有损块 go 消失。 But if the problem is the time to read blocks from disk, that probably won't help.但如果问题是从磁盘读取块的时间,那可能无济于事。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何以超过 15 秒的速度对超过 300 万条记录的表进行此查询? - How can I make this query against a table with over 3 million records faster than 15+ seconds? 如何通过Java程序更快地选择和插入百万条记录 - How to do select and insert million records through java program faster 如何使我的查询选择速度超过200万条记录 - How to make my query selection faster over a 2 million records 如何更新包含5亿条记录的表格? - How to update a table that contains 500 million records? 用 100 万条记录填充表 - Populate table with 1 million records UNION DISTINCT比9百万条记录的OR更快? - UNION DISTINCT faster than OR on 9 million records? 创建具有百万条记录的备份表 - create a backup table with million records 如何查询具有300列和超过2亿条记录的表? - How to query a table with 300 columns and over 200 million records? 如何加快在具有 1000 万条记录的表中运行的查询 - How to speed up a query which runs in a table with 10 million records 在Win 7上具有SQL Server 2008 R2唯一索引列的1000万条记录表中快速搜索 - fast search in a 10 million records table with unique index column of SQL server 2008 R2 on win 7
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM