简体   繁体   English

如何优化 SQL SELECT 查询以生成用户的新闻源?

[英]How to optimise a SQL SELECT query for generating a user's newsfeed?

I'm currently trying to build a feature to generate a user's newsfeed using the following query from a table of posts .我目前正在尝试构建一个功能,使用来自posts表的以下查询来生成用户的新闻源。 This is the SQL statement we are using:这是我们使用的 SQL 语句:

SELECT *
FROM "posts" AS "post"
WHERE "post"."sourceId" IN (...)
ORDER BY "post"."createdAt" DESC, "post"."timestamp" DESC
LIMIT 10;

The posts table currently has roughly 200K+ rows and likely to grow much larger. posts表目前大约有 200K+ 行,并且可能会变得更大。 My skills in DB performance isn't the strongest, but is there anyway to optimise this query to make it run as fast as possible?我在数据库性能方面的技能不是最强的,但是无论如何优化这个查询以使其尽可能快地运行? I'm assuming it's not enough to add an index on the sourceId column but instead would need a multi column index to also take into account the ORDER BY column.我假设在sourceId列上添加索引是sourceId ,而是需要一个多列索引来考虑ORDER BY列。

For this query:对于此查询:

SELECT p.*
FROM posts p
WHERE p.sourceId IN (...)
ORDER BY p.createdAt DESC, p.timestamp DESC
LIMIT 10;

The only index that can really help is an index on posts(sourceId) .唯一真正有用的索引是posts(sourceId)上的索引。

Note that I removed the " . Do not escape table and column names when you define them. Then you don't need to escape them when you use them.请注意,我删除了" 。在定义表和列名称时不要转义它们。然后在使用它们时不需要转义它们。

However, the query still has to sort all the data.但是,查询仍然必须对所有数据进行排序。 And that can be time-consuming.这可能很耗时。 A more complicated query is easier for Postgres to optimize: Postgres 更容易优化更复杂的查询:

select p.*
from ((select p.*
       from posts p
       where sourceId = $si_1
       order by p.createdAt desc, p.timestamp desc
       limit 10
      ) union all
      (select p.*
       from posts p
       where sourceId = $si_2
       order by p.createdAt desc, p.timestamp desc
       limit 10
      ) union all
      . . .
    ) p
order by p.createdAt desc, p.timestamp desc;

This query can use an index on posts(sourceId, createdAt desc, timestamp desc) for the inner selects.此查询可以使用posts(sourceId, createdAt desc, timestamp desc)上的索引作为内部选择。 That should be fast.那应该很快。 the outer order by will still need sorting, but the volume of data should be much smaller.外部order by仍然需要排序,但数据量应该小得多。

For instance, if a typical source has 10,000 rows and you are only looking for 3 of them, then your version of the query needs to sort 30,000 rows to fetch 10. This version fetches 30 rows uses the index and then sorts them to get the final 10.例如,如果一个典型的源有 10,000 行,而您只查找其中的 3 行,那么您的查询版本需要对 30,000 行进行排序以获取 10。此版本获取 30 行使用索引,然后对它们进行排序以获得最后10。

That would be a big difference in performance.那将是性能上的巨大差异。

You may find that just an index on sourceId is sufficient:您可能会发现仅sourceId上的索引就足够了:

CREATE INDEX src_idx ON posts (sourceId);

Postgres would then manually have to sort the records which make it past the WHERE clause. Postgres 然后必须手动对使其通过WHERE子句的记录进行排序。 Further adding the columns in the ORDER BY clause might also help:ORDER BY子句中进一步添加列也可能有帮助:

CREATE INDEX idx ON posts (sourceId, createdAt DESC, timestamp DESC);

This might speed up the sorting operation by letting Postgres sort the matching groups of sourceId records at once.这可能会通过让 Postgres 一次对匹配的sourceId记录组进行排序来加快排序操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM