泰坦的“超级节目”

Question

I'm developing an application that could work well with a graph database ( Titan ), except it's having problems with vertices with many edges, ie supernodes . 我正在开发一个可以很好地处理图形数据库（ Titan ）的应用程序，除了它有很多边缘的顶点问题，即超节点。

The supernodes link above points to a blog post from the authors of Titan, explaining a way to resolve the problem. 上面的超节点链接指向Titan作者的博客文章，解释了解决问题的方法。 The solution seems to be reducing the number of vertices by filtering on edges. 解决方案似乎是通过边缘过滤来减少顶点的数量。

Unfortunately I want to groupCount attributes of edges or vertices. 不幸的是我想要groupCount属性的边或顶点。 For example I have 1 million users and each user belongs to a country. 例如，我有100万用户，每个用户属于一个国家。 How can I do a fast groupCount to work out the number of users in each country? 如何进行快速groupCount每个国家/地区的用户数量？

What I've tried so far can be shown in this elaborate groovy script: 到目前为止我所尝试的内容可以在这个精心设计的groovy脚本中显示：

g = TitanFactory.open('titan.properties')  // Cassandra
r = new Random(100)
people = 1e6

def newKey(g, name, type) {
    return g
        .makeType()
        .name(name)
        .simple()
        .functional()
        .indexed()
        .dataType(type)
        .makePropertyKey()
}

def newLabel(g, name, key) {
    return g
        .makeType()
        .name(name)
        .primaryKey(key)
        .makeEdgeLabel()
}

country = newKey(g, 'country', String.class)
newLabel(g, 'lives', country)

g.stopTransaction(SUCCESS)

root = g.addVertex()
countries = ['AU', 'US', 'CN', 'NZ', 'UK', 'PL', 'RU', 'NL', 'FR', 'SP', 'IT']

(1..people).each {
    country = countries[(r.nextFloat() * countries.size()).toInteger()]
    g.startTransaction()
    person = g.addVertex([name: 'John the #' + it])
    g.addEdge(g.getVertex(root.id), person, 'lives', [country: country])
    g.stopTransaction(SUCCESS)
}

t0 = new Date().time

m = [:]    
root = g.getVertex(root.id)
root.outE('lives').country.groupCount(m).iterate()

t1 = new Date().time

println "groupCount seconds: " + ((t1 - t0) / 1000)

Basically one root node (for the sake of Titan not having an "all" nodes lookup), linked to many person via edges that have the country property. 基本上一个根节点（为了Titan没有“全部”节点查找），通过具有country属性的边链接到许多person 。 When I run the groupCount() on 1 million vertices, it takes over a minute. 当我在100万个顶点上运行groupCount（）时，它需要一分钟。

I realise Titan is probably iterating over each edge and collecting counts, but is there a way to make this run faster in Titan, or any other graph database? 我意识到Titan可能会迭代每个边缘并收集计数，但是有没有办法让这个在Titan或任何其他图形数据库中运行得更快？ Can the index itself be counted so it doesn't have to traverse? 索引本身可以计算，所以它不必遍历？ Are my indexes correct? 我的索引是否正确？

Answer 1

If you make 'country' a primary key for the 'lives' label and then you can retrieve all people for a particular country more quickly. 如果您将“国家/地区”作为“生活”标签的主键，那么您可以更快地检索特定国家/地区的所有人。 However, in your case you are interested in a group count which requires all edges of that root node to be retrieved in order to iterate over them and bucket the countries. 但是，在您的情况下，您感兴趣的是一个组计数，该组计数需要检索该根节点的所有边缘，以便迭代它们并对这些国家进行抢占。

Hence, this analytical query is much better suited for the graph analytics framework Faunus . 因此，这种分析查询更适合图形分析框架Faunus 。 It does not require a root vertex as it executes the groupcount by way of a complete database scan and thus avoids the supernode problem. 它不需要根顶点，因为它通过完整的数据库扫描执行groupcount，从而避免了超级节点问题。 Faunus also uses Gremlin as the query language so you only have to modify your query slightly: Faunus还使用Gremlin作为查询语言，因此您只需稍微修改您的查询：

g.V.country.groupCount.cap...

HTH, Matthias HTH，马蒂亚斯

泰坦的“超级节目”

问题描述

1 个解决方案

解决方案1
8 已采纳 2012-11-19 22:30:11

泰坦的“超级节目”

问题描述

1 个解决方案

解决方案1 8 已采纳 2012-11-19 22:30:11

解决方案1
8 已采纳 2012-11-19 22:30:11