简体   繁体   English

实现LinkedIn的有效方式,如“你如何连接”功能?

[英]Efficient way to implement LinkedIn like “How you are connected to” feature?

LinkedIn has this cool feature in which while visiting some user's profile, LinkedIn prompts how you are connecting to that user through the network. LinkedIn有这个很酷的功能,在访问某些用户的个人资料时,LinkedIn会提示你如何通过网络连接到该用户。

Assuming that the visitor and the profile owner are two nodes of a graph where the nodes represent users and edge represents friendship, a simple solution could be a bfs starting from both the nodes up to certain level and see if there are any intersections. 假设访问者和配置文件所有者是图的两个节点,其中节点表示用户,边表示友谊,一个简单的解决方案可以是从两个节点开始直到某个级别的bfs并查看是否存在任何交叉点。 The intersections would be the network link-nodes. 交叉点将是网络链路节点。

Although this sounds neat, the problem is that in order to determine friends of each person, a separate DB query is needed. 虽然这听起来很整洁,但问题在于,为了确定每个人的朋友,需要单独的数据库查询。 When the network goes deeper than 2 levels, it would be highly time consuming algorithm. 当网络深度超过2级时,算法将是非常耗时的。 Is there a better efficient alternative? 有更好的有效替代方案吗? If not, how can we add better hardware support (parallel computing, grids, distributed database etc) in order to bring down the time required for computation? 如果没有,我们如何才能增加更好的硬件支持(并行计算,网格,分布式数据库等)以减少计算所需的时间?

You can see how this can be done in the article Graphs in the database: SQL meets social networks by Lorenzo Alberton. 您可以在数据库中的图形文章中看到如何做到这一点:SQL与 Lorenzo Alberton的社交网络相遇 The example code is written for PostgreSQL using CTEs. 示例代码是使用CTE为PostgreSQL编写的。 However, I doubt that using a RDBMS for this will perform well. 但是,我怀疑使用RDBMS会很好。 I wrote up an article on how to do the same stuff as in the mentioned article using a native graph database, in this case Neo4j : Social networks in the database: using a graph database . 我写了一篇关于如何使用本机图形数据库执行相同内容的文章,在这种情况下Neo4j数据库中的社交网络:使用图形数据库 Other than the differences in performance, a graph database also simplifies the task by providing a graph API that makes it easy to handle traversals that would be extremely complex to write in SQL (or by using stored procedures). 除了性能上的差异之外,图形数据库还通过提供图形API简化了任务,该图形API使得在SQL中(或通过使用存储过程)编写极其复​​杂的遍历变得容易。 I wrote a bit more on graph databases in this thread and see this one too. 我在这个帖子中写了一些关于图形数据库的内容,并且也看到了这个

Without some kind of recursive stored procedure (CTE in SQL Server 2005+), you'll need multiple round trips as the levels get deeper. 如果没有某种递归存储过程(SQL Server 2005+中的CTE),随着关卡的深入,你需要多次往返。 However, a good cache infrastructure could really help performance as the most popular / active users' connection lists would remain cached. 但是,良好的缓存基础结构可以真正帮助提高性能,因为最热门/活跃用户的连接列表将保持缓存状态。 A read/write through cache mechanism would make things even better (cache updates cascade to db updates, cache reads cascade to db reads) 读/写缓存机制会使事情变得更好(缓存更新级联到db更新,缓存读取级联到db读取)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM