简体繁体 English

实现LinkedIn的有效方式，如“你如何连接”功能？

[英]Efficient way to implement LinkedIn like “How you are connected to” feature?

原文 2009-10-13 05:10:19 9 2 database/ hardware/ performance/ graph/ social-networking

LinkedIn has this cool feature in which while visiting some user's profile, LinkedIn prompts how you are connecting to that user through the network. LinkedIn有这个很酷的功能，在访问某些用户的个人资料时，LinkedIn会提示你如何通过网络连接到该用户。

Assuming that the visitor and the profile owner are two nodes of a graph where the nodes represent users and edge represents friendship, a simple solution could be a bfs starting from both the nodes up to certain level and see if there are any intersections. 假设访问者和配置文件所有者是图的两个节点，其中节点表示用户，边表示友谊，一个简单的解决方案可以是从两个节点开始直到某个级别的bfs并查看是否存在任何交叉点。 The intersections would be the network link-nodes. 交叉点将是网络链路节点。

Although this sounds neat, the problem is that in order to determine friends of each person, a separate DB query is needed. 虽然这听起来很整洁，但问题在于，为了确定每个人的朋友，需要单独的数据库查询。 When the network goes deeper than 2 levels, it would be highly time consuming algorithm. 当网络深度超过2级时，算法将是非常耗时的。 Is there a better efficient alternative? 有更好的有效替代方案吗？ If not, how can we add better hardware support (parallel computing, grids, distributed database etc) in order to bring down the time required for computation? 如果没有，我们如何才能增加更好的硬件支持（并行计算，网格，分布式数据库等）以减少计算所需的时间？

2 个解决方案

You can see how this can be done in the article Graphs in the database: SQL meets social networks by Lorenzo Alberton. 您可以在数据库中的图形文章中看到如何做到这一点：SQL与 Lorenzo Alberton的社交网络相遇。 The example code is written for PostgreSQL using CTEs. 示例代码是使用CTE为PostgreSQL编写的。 However, I doubt that using a RDBMS for this will perform well. 但是，我怀疑使用RDBMS会很好。 I wrote up an article on how to do the same stuff as in the mentioned article using a native graph database, in this case Neo4j : Social networks in the database: using a graph database . 我写了一篇关于如何使用本机图形数据库执行相同内容的文章，在这种情况下Neo4j ：数据库中的社交网络：使用图形数据库。 Other than the differences in performance, a graph database also simplifies the task by providing a graph API that makes it easy to handle traversals that would be extremely complex to write in SQL (or by using stored procedures). 除了性能上的差异之外，图形数据库还通过提供图形API简化了任务，该图形API使得在SQL中（或通过使用存储过程）编写极其复杂的遍历变得容易。 I wrote a bit more on graph databases in this thread and see this one too. 我在这个帖子中写了一些关于图形数据库的内容，并且也看到了这个。

Without some kind of recursive stored procedure (CTE in SQL Server 2005+), you'll need multiple round trips as the levels get deeper. 如果没有某种递归存储过程（SQL Server 2005+中的CTE），随着关卡的深入，你需要多次往返。 However, a good cache infrastructure could really help performance as the most popular / active users' connection lists would remain cached. 但是，良好的缓存基础结构可以真正帮助提高性能，因为最热门/活跃用户的连接列表将保持缓存状态。 A read/write through cache mechanism would make things even better (cache updates cascade to db updates, cache reads cascade to db reads) 读/写缓存机制会使事情变得更好（缓存更新级联到db更新，缓存读取级联到db读取）