So on my social networking website, similar to facebook, my search speed is bottlenecked like 98% by this one part. I want to rank the results based on the number of mutual friends the searching user has, with all of the results (we can assume they are users)
My friends table has 3 columns -
user_id and friend_id are both foreign keys that reference users.id
Finding friend_ids of a user is simple, it looks like this
def friends
Friend.where(
'(user_id = :id OR friend_id = :id) AND pending = false',
id: self.id
).pluck(:user_id, :friend_id)
.flatten
.uniq
.reject { |id| id == self.id }
end
So, after getting the results that match the search query, ranking the results by mutual friends, requires following steps -
The most expensive operation over here obviously getting friend_ids of of hundreds of users. So I cached the friend_ids of all the users to speed it up. The difference in performance was amazing, but I'm curious if it can be further improved.
I'm wondering if there is a way that I can get friend_ids of all the desired users in a single query, that is efficient. Something like -
SELECT user_id, [array of friend_ids of the user with id = user_id]
FROM friends
....
Can someone help me write a fast SQL or ActiveRecord query for this?
That way I can store the user_ids of all the search results and their corresponding friend_ids in a hash or some other fast data structure, and then perform the same operation of ranking (that I mentioned above). Since I won't be hitting the cache for thousands of users and their friend_ids, I think it'll speed up the process significantly
Caching your friends
table in RAM is not a viable approach if you expect your site to grow to large numbers of users, but I'm sure it does great for a smallish number of users.
It is to your advantage to get the most work you can out of the database with as few calls as possible. It is inefficient to issue large numbers of queries, as the overhead per query as comparatively large. Moreover, databases are built for the kind of task you're trying to perform. I think you are doing far too much work on the Ruby side, and you ought to let the database do the kind of work it does best.
You did not give many details, so I decided to start by defining a minimal model DB:
create table users (
user_id int not null primary key,
nick varchar(32)
);
create table friends (
user_id int not null,
friend_id int not null,
pending bool,
primary key (user_id, friend_id),
foreign key (user_id) references users(user_id),
foreign key (friend_id) references users(user_id),
check (user_id < friend_id)
);
The check
constraint on friends
avoids the same pair of users being listed in the table in both orders, and of course the PK prevents the same pair from being enrolled multiple times in the same order. The PK also automatically has a unique index associated with it.
Since I suppose the 'is a friend of' relation is supposed to be logically symmetric, it is convenient to define a view that presents that symmetry:
create view friends_symmetric (user_id, friend_id) as (
select user_id, friend_id from friends where not pending
union all
select friend_id, user_id from friends where not pending
);
(If friendship is not symmetric then you can drop the check constraint and the view, and use table friends
in place of friends_symmetric
in what follows.)
As a model query whose results you want to rank, then, I take this:
select * from users where nick like 'Sat%';
The objective is to return result rows in descending order of the number of friends each hit has in common with User1, the user on whose behalf the query is run. You might do that like so:
( update : modified this query to filter out duplicate results)
select *
from (
select
u.*,
count(mutual.shared_friend_id) over (partition by u.user_id) as num_shared,
row_number() over (partition by u.user_id) as copy_num
from
users u
left join (
select
f1.friend_id as shared_friend_id,
f2.friend_id as friend_id
from friends_symmetric f1
join friends_symmetric f2
on f1.friend_id = f2.user_id
where f1.user_id = ?
and f2.friend_id != f1.user_id
) mutual
on u.user_id = mutual.friend_id
where u.nick like 'Sat%'
) all_rows
where copy_num = 1
order by num_shared desc
where the ?
is a placeholder for a parameter containing the ID of the User1.
Edited to add:
I have structured this query with window functions instead of an aggregate query with the idea that such a structure will be easier for the query planner to optimize. Nevertheless, the inline view "mutual" could instead be structured as an aggregate query that computes the number of shared friends that the searching user has with every user that shares at least one friend, and that would permit one level of inline view to be avoided. If performance of the provided query is or becomes inadequate, then it would be worthwhile to test that variant.
There are other ways to approach the problem of performing the sorting in the DB, some of which may perform better, and there may be ways to improve the performance of each by tweaking the database (adding indexes or constraints, modifying table definitions, computing db statistics, ...).
I cannot predict whether that query will outperform what you're doing now, but I assure you that it scales better, and it is easier to maintain.
Assuming that you want a relation of the User
model whose primary key is id
, you should be able to join onto a subquery that calculates the number of mutual friends:
class User < ActiveRecord::Base
def other_users_ordered_by_mutual_friends
self.class.select("users.*, COALESCE(f.friends_count, 0) AS friends_count").joins("LEFT OUTER JOIN (
SELECT all_friends.user_id, COUNT(DISTINCT all_friends.friend_id) AS friends_count FROM (
SELECT f1.user_id, f1.friend_id FROM friends f1 WHERE f1.pending = false
UNION ALL
SELECT f2.friend_id AS user_id, f2.user_id AS friend_id FROM friends f2 WHERE f2.pending = false
) all_friends INNER JOIN (
SELECT DISTINCT f1.friend_id AS user_id FROM friends f1 WHERE f1.user_id = #{id} AND f1.pending = false
UNION ALL
SELECT DISTINCT f2.user_id FROM friends f2 WHERE f2.friend_id = #{id} AND f2.pending = false
) user_friends ON user_friends.user_id = all_friends.friend_id GROUP BY all_friends.user_id
) f ON f.user_id = users.id").where.not(id: id).order("friends_count DESC")
end
end
The subquery selects all user IDs with associated friends and inner joins that to another select with all of the current user's friends' IDs. Since it groups by the user_id
and selects the count, we get the number of mutual friends for each user_id
. I have not tested this since I don't have any sample data, but it should work.
Since this returns a scope, you can chain other scopes/conditions to the relation:
current_user.other_users_ordered_by_mutual_friends.where(attribute1: value1).reorder(:attribute2)
The select
scope as written will also give you access to the field friends_count
on instances within the relation:
<%- current_user.other_users_ordered_by_mutual_friends.each do |user| -%>
<p>User <%= user.id -%> has <%= user.friends_count -%> mutual friends.</p>
<%- end -%>
John had a great idea with the friends_symetric
view. With two filtered indexes (one on (friend_id,user_id and the other on (user_id,friend_id) ) it's gonna work great. However the query can be a bit simpler
WITH user_friends AS(
SELECT user_id, array_agg(friend_id) AS friends
FROM friends_symmetric
WHERE user_id = :user_id -- id of our user
GROUP BY user_id
)
SELECT u.*
,array_agg(friend_id) AS shared_friends -- aggregated ids of friends in case they are needed for something
,count(*) AS shared_count
FROM user_friends AS uf
JOIN friends_symmetric AS f
ON f.user_id = ANY(uf.friends) AND f.friend_id = ANY(uf.friends)
JOIN user
ON u.user_id = f.user_id
WHERE u.nick LIKE 'Sat%' --nickname of our user's friend
GROUP BY u.user_id
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.