[英]Create Edges from Vertices with Spark
Lets say I have an array of vertices and I want to create edges from them in a way that each vertex connects to the next x vertices. 可以说我有一个顶点数组,我想通过它们创建边,以使每个顶点都连接到下一个x顶点。 x could have any integer value. x可以具有任何整数值。 Is there a way to do that with Spark? 有没有办法用Spark做到这一点?
This is what I have with Scala so far: 到目前为止,这是我对Scala的了解:
//array that holds the edges
var edges = Array.empty[Edge[Double]]
for(j <- 0 to vertices.size - 2) {
for(i <- 1 to x) {
if((j+i) < vertices.size) {
//add edge
edges = edges ++ Array(Edge(vertices(j)._1, vertices(j+i)._1, 1.0))
//add inverse edge, we want both directions
edges = edges ++ Array(Edge(vertices(j+i)._1, vertices(j)._1, 1.0))
}
}
}
where vertices variable is an array of (Long, String). 其中vertices变量是(Long,String)的数组。 But the whole process is of course sequential. 但是整个过程当然是顺序的。
Edit : 编辑 :
For example, if I have vertices as such: Hello
, World
, and
, Planet
cosmos
. 例如,如果我有这样的顶点: Hello
, World
and
Planet
cosmos
。 I need the following edges: Hello -> World
, World -> Hello
, Hello -> and
, and -> Hello
, Hello
-> Planet
, Planet -> Hello
, World -> and
, and -> World
, World -> Planet
, Planet -> World
, World -> cosmos
, cosmos -> World
, and so on. 我需要以下边缘: Hello -> World
, World -> Hello
, Hello -> and
, and -> Hello
, Hello
> Planet
, Planet -> Hello
, World -> and
, and -> World
, World -> Planet
, Planet -> World
, World -> cosmos
, cosmos -> World
,等等。
Do you mean something like this? 你的意思是这样吗?
// Add dummy vertices at the end (assumes that you don't use negative ids)
(vertices ++ Array.fill(n)((-1L, null)))
.sliding(n + 1) // Slide over n + 1 vertices at the time
.flatMap(arr => {
val (srcId, _) = arr.head // Take first
// Generate 2n edges
arr.tail.flatMap{case (dstId, _) =>
Array(Edge(srcId, dstId, 1.0), Edge(dstId, srcId, 1.0))
}}.filter(e => e.srcId != -1L & e.dstId != -1L)) // Drop dummies
.toArray
If you want to run it on a RDD you simply adjust an initial step like this: 如果要在RDD上运行,只需调整一个初始步骤,如下所示:
import org.apache.spark.mllib.rdd.RDDFunctions._
val nPartitions = vertices.partitions.size - 1
vertices.mapPartitionsWithIndex((i, iter) =>
if (i == nPartitions) (iter ++ Array.fill(n)((-1L, null))).toIterator
else iter)
and of course drop toArray
. 并且当然toArray
。 If you want circular connections (tail connected to head) you can replace Array.fill(n)((-1L, null))
with vertices.take(n)
and drop filter
. 如果您想要圆形连接(尾部连接到头部),则可以用vertices.take(n)
和drop filter
替换Array.fill(n)((-1L, null))
。
So, I think this will get you, what you want: 因此,我认为这将为您带来您想要的:
First off, I define a little helper function (note that I have set edge data here to the vertex names so it's easier to inspect visually): 首先,我定义一个小辅助函数(请注意,我已在此处将边缘数据设置为顶点名称,因此更容易进行视觉检查):
def pairwiseEdges(list: List[(Long, String)]): List[Edge[String]] = {
list match {
case x :: xs => xs.flatMap(i => List(Edge(x._1, i._1, x._2 + "--" + i._2), Edge(i._1, x._1, i._2 + "--" + x._2))) ++ pairwiseEdges(xs)
case Nil => List.empty
}
}
I do a zipWithIndex
on your array to get a key, and then convert the array to an RDD: 我对您的数组执行zipWithIndex
以获取密钥,然后将该数组转换为RDD:
val vertices = List((1L,"hello"), (2L,"world"), (3L,"and"), (4L, "planet"), (5L,"cosmos")).toArray
val indexedVertices = vertices.zipWithIndex
val rdd = sc.parallelize(indexedVertices)
And then to generate the edges with x=3
: 然后生成x=3
的边:
val edges = rdd
.flatMap{case((vertexId, name), index) => for {i <- 0 to 3; if (index - i) >= 0} yield ((index - i, (vertexId, name)))}
.groupByKey()
.flatMap{case(index, iterable) => pairwiseEdges(iterable.toList)}
.distinct()
EDIT: Rewrote the flatmap
and removed the filter
as suggested by @zero323 in comments. 编辑: flatmap
并删除@ zero323在注释中建议的filter
。
This will generate the following output: 这将生成以下输出:
Edge(1,2,hello--world))
Edge(1,3,hello--and))
Edge(1,4,hello--planet)
Edge(2,1,world--hello)
Edge(2,3,world--and)
Edge(2,4,world--planet)
Edge(2,5,world--cosmos)
Edge(3,1,and--hello)
Edge(3,2,and--world)
Edge(3,4,and--planet)
Edge(3,5,and--cosmos)
Edge(4,1,planet--hello)
Edge(4,2,planet--world)
Edge(4,3,planet--and)
Edge(4,5,planet--cosmos)
Edge(5,2,cosmos--world)
Edge(5,3,cosmos--and)
Edge(5,4,cosmos--planet)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.