I have the product of a groupBy on the vertices of a graph
val filteredNodesGroups = somegraph.vertices.groupBy{
case(_, attr) =>
{
attr
}
}
and I would like to create a new graph for each group of vertices, for example
for ((i,nodegroup) <- filteredNodesGroups){
...<transformation to produce a nodegroupRDD from nodegroup>...
var gr = Graph(nodegroupRDD, somegraph.edges)
}
The problem is that nodegroup
is of type Iterable[(VertexId, String)]
, meaning that each nodegroup is no longer an RDD.
How can I get past this, that is, how can I recreate RDD structures for each nodegroup
? In other words, hat can I replace the ...<>...
code with, in order to make it work?
I tried to use the parallelize option, but from what I read it should not be possible, neither the correct way to do this.
I would appreciate any help. Cheers
If number of unique attributes is relatively small you can collect and create RDDs locally:
val attrs = somegraph.vertices.map{case (_, attr) => attr}.distinct.collect
val grahps = attrs.map(attr => {
val vertices = somegraph.vertices.filter{case (_, someAttr) =>
someAttr == attr
}
val edges = somegraph.edges.filter(...)
Graph(vertices, edges)
})
Note that you should probably filter edges
as well, otherwise you'll get a bunch of vertices with null
attribute.
Another approach is to use GraphOps.filter
. It is probably more efficient but you still need to provide values to filter.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.