简体   繁体   English

Doctrine 大规模一对多性能

[英]Doctrine one-to-many performance at scale

I've built a news feed with post having a one-to-many relationship with post_reaction .我建立了一个新闻提要,其中postpost_reaction具有一对多关系。 The concept is simple, a post can be liked, and each like is stored in the post_reaction table alongside with who liked it and the type of reaction (like, love, etc.)这个概念很简单,一个帖子可以被点赞,每个点赞都存储在post_reaction表中,连同点赞的人和反应类型(点赞、爱等)

Everything works just fine™, however as things scale, performance decreases, namely as the post_reaction table grows.一切正常™,但是随着规模的扩大,性能会下降,即随着post_reaction表的增长。

For testing purposes, I generated 200 posts and gave each post 1,000 reactions.出于测试目的,我生成了 200 个帖子并给每个帖子 1,000 个反应。 This results in 200,000 total reactions stored in the post_reaction table.这导致post_reaction表中存储了 200,000 个总反应。

My Twig template is provided a list of posts, limited to 20. As the template iterates through displaying each post, it then calls post.reactions|length to count the number of reactions.我的 Twig 模板提供了一个帖子列表,限制为 20 个。当模板循环显示每个帖子时,它会调用post.reactions|length来计算反应的数量。 This executes the following db query:这将执行以下数据库查询:

SELECT
  t0.reaction AS reaction_1,
  t0.id AS id_2,
  t0.created AS created_3,
  t0.post_id AS post_id_4,
  t0.user AS user_5
FROM
  post_reaction t0
WHERE
  t0.post_id = ?

This query takes on average 4-7ms to run each time it runs for the 20 posts I'm rendering.每次运行我正在呈现的 20 个帖子时,此查询平均需要 4-7 毫秒才能运行。 This totals to ~100ms worth of DB queries just to count the posts.这总计约 100 毫秒的数据库查询只是为了计算帖子。

That doesn't seem too bad, however we observe some overhead processing this much data in the application.这看起来还不错,但是我们观察到在应用程序中处理这么多数据会产生一些开销。

Taking a look at the profiler for the entire request, we see the following:查看整个请求的分析器,我们看到以下内容: 页面性能配置文件 Our overall processing time in this request was 585ms我们在此请求中的整体处理时间为585 毫秒

components/news_post.html.twig is the component that calls post.reactions|length which triggers the db query. components/news_post.html.twig是调用触发数据库查询的post.reactions|length的组件。 If we initiate the same request without querying reactions, we observe the following.如果我们在没有查询反应的情况下发起相同的请求,我们会观察到以下情况。 页面性能配置文件 - 无需查询反应 Our overall processing time in this request was 179ms我们在此请求中的整体处理时间为179 毫秒

406ms/69.4% faster .快 406 毫秒/69.4% I believe this is mostly attributed by overhead in doctrine while it processes the 20,000 rows into objects, only for us to count them later.我相信这主要归因于 doctrine 中的开销,因为它将 20,000 行处理成对象,只是为了我们稍后计算它们。

In an attempt to mitigate this, I wanted to see if joining the reactions onto my post query would help.为了缓解这种情况,我想看看将反应加入到我的帖子查询中是否会有所帮助。

SELECT
  p0_.replies_allowed AS replies_allowed_0,
  p0_.highlight_date AS highlight_date_1,
  p0_.title AS title_2,
  p0_.content AS content_3,
  p0_.id AS id_4,
  p0_.created AS created_5,
  p0_.updated AS updated_6,
  p0_.news_feed_id AS news_feed_id_7,
  p0_.created_by_id AS created_by_id_8,
  p0_.updated_by_id AS updated_by_id_9
FROM
  post p0_
  INNER JOIN post_reaction p1_ ON (p1_.post_id = p0_.id)
WHERE
  p0_.news_feed_id = ?
ORDER BY
  CASE WHEN p0_.highlight_date > ? THEN 0 ELSE 1 END ASC,
  p0_.created DESC
LIMIT
  20

However it causes issues with the LIMIT 20 clause in the query, as joining reactions only allows one post to return due to the number of reactions in this dataset.但是,它会导致查询中的LIMIT 20子句出现问题,因为由于此数据集中的反应数量,加入反应只允许返回一个帖子。

I'm not sure if I should continue to develop a way to make joining possible, or explore an alternative, whatever that may be.我不确定我是否应该继续开发一种使加入成为可能的方法,或者探索一个替代方案,无论是什么。 Ideally I'd like to reduce the 406ms of extra execution time since it's almost 70% of the total page processing time, just to count likes..理想情况下,我想减少 406 毫秒的额外执行时间,因为它几乎占总页面处理时间的 70%,只是为了计算喜欢……


Edit: As requested, the output for show create table post_reaction编辑:根据要求,output 用于show create table post_reaction

CREATE TABLE `post_reaction` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `post_id` int(11) DEFAULT NULL,
  `user` int(11) DEFAULT NULL,
  `reaction` int(11) NOT NULL,
  `reaction_timestamp` datetime NOT NULL,
  PRIMARY KEY (`id`),
  KEY `IDX_1B3A8E564B89032C` (`post_id`),
  KEY `IDX_1B3A8E568D93D649` (`user`),
  CONSTRAINT `FK_1B3A8E564B89032C` FOREIGN KEY (`post_id`) REFERENCES `post` (`id`),
  CONSTRAINT `FK_1B3A8E568D93D649` FOREIGN KEY (`user`) REFERENCES `user` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=200786 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
  • (2nd query) Don't JOIN to post_reaction since you are not using any columns from it. (第二个查询)不要JOIN post_reaction ,因为您没有使用它的任何列。

  • The complexity of the ORDER BY makes it impossible to do anything faster look at all 1000 reactions. ORDER BY的复杂性使得不可能更快地查看所有 1000 个反应。 Hence the LIMIT has very little effect on performance.因此LIMIT对性能的影响很小。

  • Please provide SHOW CREATE TABLE post_reaction , there may be some improvements we can make there.请提供SHOW CREATE TABLE post_reaction ,我们可以在那里做一些改进。 But you certainly need some index starting with post_id .但是您当然需要一些以post_id开头的索引。 We may get some improvement by rearranging the PRIMARY KEY to start with that column.我们可能会通过重新排列PRIMARY KEY以从该列开始来获得一些改进。

  • (I don't know anything about the Controller or twigs. They seem to be the costly part?) (我对Controller和树枝一无所知。它们似乎是昂贵的部分?)

More更多的

"count the number of reactions [for every post]" -- That is a single SQL query that won't take long: “计算 [每个帖子] 的反应数量”——这是一个不会花费很长时间的 SQL 查询:

SELECT post_id,
       COUNT(*) AS reaction_count
    FROM post_reaction
    GROUP BY post_id;

No iterating through posts;无需遍历帖子; no 20 at a time;一次不超过 20 个; everything done in one simple pass over an index in that one table.一次简单地传递该表中的索引即可完成所有操作。

I tried an equivalent query on a table of 500K cities in 92 countries.我在 92 个国家/地区的 50 万个城市的表上尝试了一个等效查询。 It took 0.13 seconds.耗时0.13秒。

The lesson here is that SQL shines when it is asked to do lots of the same thing on lots of rows.这里的教训是,当要求 SQL 在很多行上做很多相同的事情时,它会发光。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM