简体   繁体   中英

Convert Iterable to RDD in spark graphx

I have the product of a groupBy on the vertices of a graph

    val filteredNodesGroups = somegraph.vertices.groupBy{ 
        case(_, attr) => 
        {
            attr
        }
    }

and I would like to create a new graph for each group of vertices, for example

    for ((i,nodegroup) <- filteredNodesGroups){

        ...<transformation to produce a nodegroupRDD from nodegroup>...

        var gr = Graph(nodegroupRDD, somegraph.edges)
    }

The problem is that nodegroup is of type Iterable[(VertexId, String)] , meaning that each nodegroup is no longer an RDD.

How can I get past this, that is, how can I recreate RDD structures for each nodegroup ? In other words, hat can I replace the ...<>... code with, in order to make it work?

I tried to use the parallelize option, but from what I read it should not be possible, neither the correct way to do this.

I would appreciate any help. Cheers

If number of unique attributes is relatively small you can collect and create RDDs locally:

 val attrs = somegraph.vertices.map{case (_, attr) => attr}.distinct.collect

 val grahps = attrs.map(attr => {
     val vertices = somegraph.vertices.filter{case (_, someAttr) =>
          someAttr == attr
     }
     val edges = somegraph.edges.filter(...) 
     Graph(vertices, edges)
 })

Note that you should probably filter edges as well, otherwise you'll get a bunch of vertices with null attribute.

Another approach is to use GraphOps.filter . It is probably more efficient but you still need to provide values to filter.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM