简体   繁体   English

使用位集的社交图

[英]Social graphs using bitsets

I came across the following line in an article where this internet technology firm talks about how they baked social features into their application: 我在一篇文章中碰到了以下内容,该互联网技术公司讨论了他们如何将社交功能融入其应用程序:

Apache Thrift, Krati Data Store, JavaEWAH Compressed Bitmaps and JRuby forms the part of our remote service which stores our social graph in high-performing persistent compressed bitmap format. Apache Thrift,Krati数据存储,JavaEWAH压缩位图和JRuby构成了我们的远程服务的一部分,该服务以高性能的持久压缩位图格式存储我们的社交图。

I am trying to make sense out of this. 我试图从中弄清楚。 Till now I have figured out what is meant by Apache Thift (and why it is to be used), JavaEWAH, bit sets, social graph and GUI analysis. 到目前为止,我已经弄清了Apache Thift(以及为什么要使用它),JavaEWAH,位集,社交图和GUI分析的含义。 Krati Data source does not seem to have a good wiki/tutorial for itself. Krati数据源似乎并没有很好的Wiki /教程。 Furthermore I cannot understand the setup, as to how social graph is being stored and processed using bitsets and the mentioned technology. 此外,对于如何使用位集和上述技术存储和处理社交图,我无法理解设置。

If you could explain the same and guide me to relevant resources. 如果您能解释相同的内容,并引导我获得相关资源。 Alternatively if you can suggest a better alternative to the stack so described. 或者,如果您可以对上述堆栈提出更好的替代方案。

Ok, let's put some basics upfront: 好的,让我们先介绍一些基础知识:

I guess, your article is that one: http://www.nextbigwhat.com/technology-implementation-for-social-features-297/ 我想,您的文章就是这样的: http : //www.nextbigwhat.com/technology-implementation-for-social-features-297/

http://en.wikipedia.org/wiki/Social_graph 'The social graph in the Internet context is a graph that depicts personal relations of internet users' http://en.wikipedia.org/wiki/Social_graph “互联网环境中的社交图是描述互联网用户个人关系的图”

http://thrift.apache.org/ combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages. http://thrift.apache.org/将软件堆栈与代码生成引擎结合在一起,以构建可在C ++,Java,Python,PHP,Ruby,Erlang,Perl,Haskell,C#,Cocoa,JavaScript, Node.js,Smalltalk,OCaml和Delphi等语言。

https://github.com/krati/krati Krati is a simple persistent data store with very low latency and high throughput. https://github.com/krati/krati Krati是一个简单的持久数据存储,具有非常低的延迟和高吞吐量。 It is designed for easy integration with read-write-intensive applications with little effort in tuning configuration, performance and JVM garbage collection. 它旨在轻松与需要大量读写的应用程序集成,而无需费力地调整配置,性能和JVM垃圾回收。

http://code.google.com/p/javaewah/ The bit array data structure is implemented in Java as the BitSet class.... JavaEWAH is a word-aligned compressed variant of the Java bitset class. http://code.google.com/p/javaewah/位数组数据结构在Java中作为BitSet类实现。...JavaEWAH是Java位集类的字对齐的压缩变体。

http://jruby.org/apidocs/serialized-form.html .... http : //jruby.org/apidocs/serialized-form.html ...。

----- Here's are my interpretations: -----这是我的解释:

The context of the article is technology impementation. 本文的内容是技术实现。 So they listed everything. 因此,他们列出了所有内容。 In this context I guess we can ignore apache Thrift for now, as this is just the glue, which they use to attach technologies to each other. 在这种情况下,我想我们暂时可以忽略apache Thrift,因为这只是胶水,他们使用胶水将技术彼此联系在一起。 Also jrubi forms goes somewhat out of scope for the social graph considerations. 同样,jrubi形式在某种程度上超出了社交图的考虑范围。 Yes a social graph needs input and output, but forms addresses the topic which level of details comes from there. 是的,社交图需要输入和输出,但是表单可以解决主题的细节层次来自那里的问题。

The interesting part is krati and javaewah. 有趣的部分是krati和javaewah。 Well reading the article makes obvious, that they implement their social graphs via memberships. 阅读这篇文章很明显,他们可以通过成员身份实现社交图。 This can be about groups or roles or something similar. 这可以是关于组或角色或类似的东西。 Memberships can be implemented as bitmap: Have a group with a bitmap with one bit per each user. 成员资格可以实现为位图:拥有一个带有位图的组,每个用户一个位。 Each Bit can be addressed to check if the user is member or not. 可以寻址每个位以检查用户是否为成员。 As simple as that. 就如此容易。 The Bitmaps are made up by Krati and than stored in/managed by JavaEWAH. 位图由Krati组成,然后由JavaEWAH存储/管理。 The cons is: The more users, the bigger does the bitmap go. 缺点是:用户越多,位图就越大。 The Pro: It is FAST. 优点:快速。

In relational databases each relation would be implemented as foreign key 2 foreign key pair (which causes some index overhead >eg. 2 ints for the keys and then 2*2+x ints for the double index, whereby the x debends on the database). 在关系数据库中,每个关系都将实现为外键2个外键对(这会导致某些索引开销>例如,键的2个整数,然后对双索引的2 * 2 + x个整数,从而x在数据库上呈递减形式) 。 Especially with lots of memberships per group this can get a disk space utilization challenge. 尤其是每个组具有大量成员资格时,这可能会遇到磁盘空间利用率挑战。 So I guess in such cases the compressed BitMap is implementation is even better in terms of storage utilization. 因此,我认为在这种情况下,就存储利用率而言,压缩的BitMap实现更为出色。

UPDATE--- UPDATE ---

One could write books on the whole topic. 一个人可以写关于整个主题的书。 I guess I need to make a point here. 我想我需要在这里指出一点。 However good starting points from here are: 但是,从这里出发的好起点是:

http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them http://www.slideshare.net/lemire/all-about-bitmap-indexes-and-sorting-them

https://github.com/jingwei/krati/commit/ab1432003e59a07269d23c1cb307625b0e8c5be2 https://github.com/jingwei/krati/commit/ab1432003e59a07269d23c1cb307625b0e8c5be2

http://en.wikipedia.org/wiki/Data_store http://en.wikipedia.org/wiki/Key-value_store (to get an idea about different database concepts than just the relative one) http://en.wikipedia.org/wiki/Data_store http://en.wikipedia.org/wiki/Key-value_store (以获得关于不同数据库概念的概念,而不仅仅是相对的概念)

http://dev.mysql.com/doc/refman/5.0/en/innodb-physical-record.html (to get some indication what about the costs of a foreign key 2 foreign key relation) http://dev.mysql.com/doc/refman/5.0/en/innodb-physical-record.html (以获取一些有关外键2外键关系成本的信息)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM