[英]What is the best approach for database queries that return results similar to Twitter's feed of tweets by people you follow?
My website lets users submit posts and subscribe to posts by other people. 我的网站允许用户提交帖子并订阅其他人的帖子。 The homepage of the site displays the most recent posts by the people the user follows. 网站首页显示了用户关注的人的最新帖子。 There is no limit to the number of the people a user can follow. 用户可以跟随的人数没有限制。 Some users are following thousands of other users. 一些用户正在关注成千上万的其他用户。 Some users have made more than 15,000 posts. 一些用户发表了15,000多个帖子。
The posts database table is is organized like this (a few irrelevant columns are omitted for clarity): 帖子数据库表的组织方式如下(为清楚起见,省略了一些无关的列):
id
author_id
post_content
date_added
I have 2 working solutions, but I'm not sure if either is the best approach: 我有2种有效的解决方案,但是我不确定哪一种是最好的方法:
Query the table for posts that match any of the author_ids: 在表格中查询与任何author_ids匹配的帖子:
SELECT id FROM posts WHERE author_id IN (12, 34, 56, 78, 90, ...) ORDER BY date_time DESC LIMIT 100;
This works, but crawls when users are following thousands of people. 这是可行的,但是当用户关注成千上万的人时会爬行。
This works, but sometimes crawls when thousands of user feeds are returned and merged into an array with 100,000+ items. 这可行,但是当返回数千个用户供稿并将其合并到具有100,000多个项目的数组时,有时会爬网。 It feels like overkill when all I care about is the most recent 100 items. 当我只关心最近的100件商品时,这感觉就像是杀了我。 Additionally, not all user feeds will be in cache. 此外,并非所有用户供稿都将在缓存中。 Some old users may no longer use the site, but are still followed by new users resulting in the old user's feed to be freshly queried (and then cached). 一些老用户可能不再使用该网站,但仍然跟随着新用户,导致重新查询(然后缓存)了老用户的供稿。
What about (untested, but you get the idea): 怎么样(未经测试,但您知道了):
SELECT id FROM posts
CROSS JOIN followers ON posts.author_id = followers.user_id
WHERE followers.followed_by_user_id = INSERT_USER_ID_HERE
ORDER BY posts.date_time DESC
LIMIT 100;
or 要么
SELECT id FROM posts
WHERE author_id IN (
SELECT user_id FROM followers
WHERE followed_by_user_id = INSERT_USER_ID_HERE
)
ORDER BY date_time DESC
LIMIT 100;
note: to clarify, the table followers
contains two columns user_id
and followed_by_user_id
. 注意:为澄清起见,表followers
包含两列user_id
和followed_by_user_id
。 If a row contains the value ( user_id:7
, followed_by_user_id:42
), it means that user 42 follows user 7. 如果一行包含值( user_id:7
, followed_by_user_id:42
),则意味着用户42跟随用户7。
An optimization for your Solution 2 which avoids merging and sorting all the post ids: 解决方案2的一种优化,避免了对所有帖子ID进行合并和排序:
id
. 创建一个数组来保存结果,并通过复制第一作者的前100后IDS和排序的内容id
。 id
in the result array is greater than the maximum id
of the author's posts. 检查结果数组中的最小id
是否大于作者帖子的最大id
。 Also, you could maintain an array with the maximum post id of every author. 另外,您可以维护一个数组,其中每个作者的帖子ID都应为最大值。 Before fetching the top-100 posts of an author, you could check this array. 在获取作者的前100名帖子之前,您可以检查一下此数组。 This will avoid fetching/caching the posts of inactive users. 这将避免获取/缓存不活动用户的帖子。
For Solution 1 , ordering by id
will be a bit faster than ordering by date_time
. 对于解决方案1 ,按id
排序将比按date_time
排序快一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.