[英]Merge multiple graphs together in GraphX
嗨,我建立了多个多重图形(总共11个)
例如:图表1-SongArtist-SongVertex(Id,SongName)ArtistVertex(Id,ArtistName,NetWorth)Edge(Song,Artist,“ Sung”)
图2-SongWriter-SongVertex(Id,SongName)WriterVertex(Id,ArtistName)Edge(Song,Writer,“ WrittenBy”)
图3-ArtistWriter- ArtistVertex(Id,ArtistName,NetWorth)WriterVertex(Id,ArtistName)Edge(Artist,Writer,“ Collaborated”)...
我希望能够将所有这些合并在一起以形成一张图。 Graph1和Graph2可以在Song上合并,Graph2和Graph3可以在Writer上合并,而Graph1和Graph3可以在Artist上合并。
一些图具有由case类定义的边属性和顶点属性。 下面显示了Graph3的开发方式。 其他的结构大致相同,例如:
case class ArtistWriterProperties(weight: String, edgeType: String) extends EdgeProperty
case class ArtistProperty(val vertexType: String, val artistName: String, val netWorth: String) extends VertexProperty
case class WriterProperty(val vertexType: String, val writerName: String) extends VertexProperty
val ArtistWriter: RDD[(VertexId, VertexProperty)] = sc.textFile(vertexArtistWriter).map {
line =>
val row = line.split(",")
val id = row(0).toLong
val vertexType = row(1)
val prop = vertexType match {
case "Artist" => ArtistProperty(vertexType, row(2), row(3))
case "Writer" => WriterProperty(vertexType, row(2))
}
(id, prop)
}
val edgesArtistWriterCollaborated: RDD[Edge[EdgeProperty]] = sc.textFile(edgeWeightedArtistWriterCollaborated).map {
line =>
val row = line.split(",")
Edge(row(0).toLong, row(1).toLong, ArtistWriterProperties(row(2), row(3)))
}
val graph3 = Graph(ArtistWriter, edgesArtistWriterCollaborated)
我正在尝试这种事情:
val graph2And3 = Graph(
graph2.vertices.union(graph3.vertices),
graph2.edges.union(graph3.edges)
).partitionBy(RandomVertexCut).
groupEdges( (attr1, attr2) => attr1 + attr2 )
但是我遇到错误-类型不匹配
因此,基本上,您需要对顶点执行join
,对边执行union
。
对于每个图,您可以获得顶点的RDD和边缘的RDD。
1)通过所需的键顺序地对顶点进行full outer join
RDD,并为最终顶点创建新的ID,例如graph1.vertexes.fullOuterJoin(graph2.vertexes, "SongArtist").fullOuterJoin...
2)合并边的所有RDD,然后可以根据新的顶点RDD和边的RDD创建图。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.