简体   繁体   English

尽管有缓存,但共同的朋友搜索查询的性能瓶颈达到98%

[英]Performance of search query bottlenecked 98% by mutual friends despite caching

So on my social networking website, similar to facebook, my search speed is bottlenecked like 98% by this one part. 因此,在我的社交网站(类似于Facebook)上,这一部分使我的搜索速度出现瓶颈,如98%。 I want to rank the results based on the number of mutual friends the searching user has, with all of the results (we can assume they are users) 我想根据搜索用户所拥有的共同朋友的数量以及所有结果来对结果进行排名(我们可以假设他们是用户)

My friends table has 3 columns - 我的好友表格有3列-

  • user_id (person who sends the request) user_id(发送请求的人)
  • friend_id (person who receives the request) friend_id(接收请求的人)
  • pending (boolean to indicate if the request was accepted or not) 待处理(布尔值,表示请求是否被接受)

user_id and friend_id are both foreign keys that reference users.id user_id和friend_id都是引用users.id的外键

Finding friend_ids of a user is simple, it looks like this 查找用户的friend_id很简单,看起来像这样

def friends
  Friend.where(
    '(user_id = :id OR friend_id = :id) AND pending = false',
     id: self.id
  ).pluck(:user_id, :friend_id)
   .flatten
   .uniq
   .reject { |id| id == self.id }
end

So, after getting the results that match the search query, ranking the results by mutual friends, requires following steps - 因此,在获得与搜索查询匹配的结果后,按共同的朋友对结果进行排名,需要执行以下步骤-

  • Get user_ids of all the searching user's friends - Set(A). 获取所有搜索用户的朋友的user_id-Set(A)。 Above mentioned friends method does this 上面提到的朋友方法做到这一点
  • Loop over each of the ids in Set(A) - 循环遍历Set(A)中的每个id-
    • Get user_ids of all the friends of |id| 获取| id |的所有朋友的user_ids - Set (B). -设置(B)。 Again, done by friends method 再次,由朋友做方法
    • Find length of intersection of set A and set B 求集合A与集合B的交点长度
  • Order in descending order of length of intersections for all results 对于所有结果,按相交长度的降序排列

The most expensive operation over here obviously getting friend_ids of of hundreds of users. 此处最昂贵的操作显然会获得数百个用户的friend_id。 So I cached the friend_ids of all the users to speed it up. 因此,我缓存了所有用户的friend_id,以加快速度。 The difference in performance was amazing, but I'm curious if it can be further improved. 性能上的差异是惊人的,但是我很好奇是否可以进一步改善。

I'm wondering if there is a way that I can get friend_ids of all the desired users in a single query, that is efficient. 我想知道是否有一种方法可以在单个查询中获得所有所需用户的friend_id,这是有效的。 Something like - 就像是 -

SELECT user_id, [array of friend_ids of the user with id = user_id]
FROM friends
....

Can someone help me write a fast SQL or ActiveRecord query for this? 有人可以帮我为此编写快速的SQL或ActiveRecord查询吗?

That way I can store the user_ids of all the search results and their corresponding friend_ids in a hash or some other fast data structure, and then perform the same operation of ranking (that I mentioned above). 这样,我可以将所有搜索结果的user_id及其对应的friend_id存储在哈希或其他一些快速数据结构中,然后执行相同的排名操作(如上所述)。 Since I won't be hitting the cache for thousands of users and their friend_ids, I think it'll speed up the process significantly 由于不会为成千上万的用户和他们的friend_id提供缓存,因此我认为它将大大加快这一过程

Caching your friends table in RAM is not a viable approach if you expect your site to grow to large numbers of users, but I'm sure it does great for a smallish number of users. 如果您希望站点能够增加到大量用户,则在RAM中缓存friends表不是一种可行的方法,但是我敢肯定,这对于少数用户来说非常有用。

It is to your advantage to get the most work you can out of the database with as few calls as possible. 以尽可能少的调用次数从数据库中获得最大的工作量,对您有利。 It is inefficient to issue large numbers of queries, as the overhead per query as comparatively large. 发出大量查询的效率很低,因为每个查询的开销相对较大。 Moreover, databases are built for the kind of task you're trying to perform. 此外,数据库是为您要执行的任务而构建的。 I think you are doing far too much work on the Ruby side, and you ought to let the database do the kind of work it does best. 我认为您在Ruby方面所做的工作太多了,您应该让数据库完成它最擅长的工作。

You did not give many details, so I decided to start by defining a minimal model DB: 您没有提供很多细节,所以我决定首先定义一个最小模型DB:

create table users (
  user_id int not null primary key,
  nick varchar(32)
  );

create table friends (
  user_id int not null,
  friend_id int not null,
  pending bool,
  primary key (user_id, friend_id),
  foreign key (user_id) references users(user_id),
  foreign key (friend_id) references users(user_id),
  check (user_id < friend_id)
  );

The check constraint on friends avoids the same pair of users being listed in the table in both orders, and of course the PK prevents the same pair from being enrolled multiple times in the same order. friendscheck约束可避免在表中以两个顺序列出同一对用户,并且PK当然可以防止同一对用户以同一顺序多次注册。 The PK also automatically has a unique index associated with it. PK还自动具有与其关联的唯一索引。

Since I suppose the 'is a friend of' relation is supposed to be logically symmetric, it is convenient to define a view that presents that symmetry: 由于我认为“是……的朋友”关系在逻辑上是对称的,因此定义表示对称性的视图很方便:

create view friends_symmetric (user_id, friend_id) as (
  select user_id, friend_id from friends where not pending
  union all
  select friend_id, user_id from friends where not pending
  );

(If friendship is not symmetric then you can drop the check constraint and the view, and use table friends in place of friends_symmetric in what follows.) (如果友谊不是对称的,那么您可以删除检查约束和视图,并在随后的内容中使用表friends代替friends_symmetric 。)

As a model query whose results you want to rank, then, I take this: 然后,作为要对其结果进行排名的模型查询,我采用以下方法:

select * from users where nick like 'Sat%';

The objective is to return result rows in descending order of the number of friends each hit has in common with User1, the user on whose behalf the query is run. 目的是按与用户1相同的用户1的顺序返回结果行,其中每个命中的好友数与用户1相同。 You might do that like so: 您可能会这样:

( update : modified this query to filter out duplicate results) 更新 :修改此查询以过滤出重复的结果)

select *
from (
    select
      u.*,
      count(mutual.shared_friend_id) over (partition by u.user_id) as num_shared,
      row_number() over (partition by u.user_id) as copy_num
    from 
      users u
      left join (
          select
            f1.friend_id as shared_friend_id,
            f2.friend_id as friend_id
          from friends_symmetric f1
            join friends_symmetric f2
              on f1.friend_id = f2.user_id
          where f1.user_id = ?
            and f2.friend_id != f1.user_id
        ) mutual
        on u.user_id = mutual.friend_id
    where u.nick like 'Sat%'
  ) all_rows
where copy_num = 1
order by num_shared desc

where the ? 在哪里? is a placeholder for a parameter containing the ID of the User1. 是包含User1的ID的参数的占位符。


Edited to add: 编辑添加:

I have structured this query with window functions instead of an aggregate query with the idea that such a structure will be easier for the query planner to optimize. 我使用窗口函数而不是聚合查询来构造此查询,其构想是这样的结构将使查询计划者更容易优化。 Nevertheless, the inline view "mutual" could instead be structured as an aggregate query that computes the number of shared friends that the searching user has with every user that shares at least one friend, and that would permit one level of inline view to be avoided. 但是,内联视图“相互”可以改为聚合查询,该查询计算搜索用户与每个共享至少一个朋友的用户所拥有的共享朋友的数量,并且可以避免一级内联视图。 If performance of the provided query is or becomes inadequate, then it would be worthwhile to test that variant. 如果提供的查询的性能不足或变得不足,则值得测试该变体。


There are other ways to approach the problem of performing the sorting in the DB, some of which may perform better, and there may be ways to improve the performance of each by tweaking the database (adding indexes or constraints, modifying table definitions, computing db statistics, ...). 还有其他方法可以解决在数据库中执行排序的问题,其中某些方法可能效果更好,并且可能存在通过调整数据库来提高每种方法的性能的方法(添加索引或约束,修改表定义,计算数据库统计信息,...)。

I cannot predict whether that query will outperform what you're doing now, but I assure you that it scales better, and it is easier to maintain. 我无法预测该查询是否会胜过您现在正在执行的操作,但是我向您保证它的扩展性更好,并且更易于维护。

Assuming that you want a relation of the User model whose primary key is id , you should be able to join onto a subquery that calculates the number of mutual friends: 假设您想要一个主模型为idUser模型的关系,则应该能够加入一个计算共同朋友数的子查询:

class User < ActiveRecord::Base
  def other_users_ordered_by_mutual_friends
    self.class.select("users.*, COALESCE(f.friends_count, 0) AS friends_count").joins("LEFT OUTER JOIN (
      SELECT all_friends.user_id, COUNT(DISTINCT all_friends.friend_id) AS friends_count FROM (
        SELECT f1.user_id, f1.friend_id FROM friends f1 WHERE f1.pending = false
        UNION ALL
        SELECT f2.friend_id AS user_id, f2.user_id AS friend_id FROM friends f2 WHERE f2.pending = false
      ) all_friends INNER JOIN (
        SELECT DISTINCT f1.friend_id AS user_id FROM friends f1 WHERE f1.user_id = #{id} AND f1.pending = false
        UNION ALL
        SELECT DISTINCT f2.user_id FROM friends f2 WHERE f2.friend_id = #{id} AND f2.pending = false
      ) user_friends ON user_friends.user_id = all_friends.friend_id GROUP BY all_friends.user_id
    ) f ON f.user_id = users.id").where.not(id: id).order("friends_count DESC")
  end
end

The subquery selects all user IDs with associated friends and inner joins that to another select with all of the current user's friends' IDs. 子查询选择所有具有关联朋友的用户ID,并通过内部联接将其与具有当前用户所有朋友ID的另一个选择进行内部联接。 Since it groups by the user_id and selects the count, we get the number of mutual friends for each user_id . 由于它按user_id分组并选择计数,因此我们得到每个user_id的共同朋友数。 I have not tested this since I don't have any sample data, but it should work. 由于没有任何示例数据,因此我没有对此进行测试,但是它应该可以工作。

Since this returns a scope, you can chain other scopes/conditions to the relation: 由于此操作返回一个范围,因此可以将其他范围/条件链接到该关系:

current_user.other_users_ordered_by_mutual_friends.where(attribute1: value1).reorder(:attribute2)

The select scope as written will also give you access to the field friends_count on instances within the relation: select作用域还将使您可以访问关系中的实例上的friends_count字段:

<%- current_user.other_users_ordered_by_mutual_friends.each do |user| -%>
  <p>User <%= user.id -%> has <%= user.friends_count -%> mutual friends.</p>
<%- end -%>

John had a great idea with the friends_symetric view. 约翰对friends_symetric视图有个好主意。 With two filtered indexes (one on (friend_id,user_id and the other on (user_id,friend_id) ) it's gonna work great. However the query can be a bit simpler 使用两个过滤的索引(一个在(friend_id,user_id上,另一个在(user_id,friend_id)上)可以很好地工作,但是查询可能会更简单

WITH user_friends AS(
  SELECT user_id, array_agg(friend_id) AS friends
    FROM friends_symmetric
        WHERE user_id = :user_id -- id of our user
    GROUP BY user_id
)
SELECT u.*
       ,array_agg(friend_id) AS shared_friends -- aggregated ids of friends in case they are needed for something
       ,count(*) AS shared_count    
FROM user_friends AS uf     
    JOIN friends_symmetric AS f
        ON f.user_id = ANY(uf.friends) AND f.friend_id = ANY(uf.friends)
    JOIN user
        ON u.user_id = f.user_id
WHERE u.nick LIKE 'Sat%' --nickname of our user's friend
GROUP BY u.user_id

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM