繁体   English   中英

如何从 RDD/DF 创建图表? Scala 火花

[英]How to create a graph from an RDD/DF? Scala Spark

我的RDD实际上包含一些生物学数据,即蛋白质名称,以及它们之间的相似度 我想创建图,其中顶点是蛋白质,边表示相似值。 这实际上是我的RDD:

+-------------+------------+------------+
|   Protein1  |  Protein2  | Similarity |
+-------------+------------+------------+
|    P28469   |   Q70UP5   | 0.11111111 |
|    O45687   |   P00325   |    1.0     |
|    A7ME43   |   Q5HG16   |    0.6     |
|    A4VJT7   |   Q9LD43   |    1.0     |
|    P31937   |   Q64415   | 0.07692308 |
|    A1VAA0   |   Q9L298   |    1.0     |
|    B8DG74   |   Q6MT35   |    1.0     |
+-------------+------------+------------+

谢谢!

不是相同的数据,但您需要这样做(当然来自文件)并将这种方法适应您的数据:

// Vertex DataFrame
val v = sqlContext.createDataFrame(List(
  ("a", "Alice", 34),
  ("b", "Bob", 36),
  ("c", "Charlie", 30),
  ("d", "David", 29),
  ("e", "Esther", 32),
  ("f", "Fanny", 36),
  ("g", "Gabby", 60)
)).toDF("id", "name", "age")
// Edge DataFrame
val e = sqlContext.createDataFrame(List(
  ("a", "b", "friend"),
  ("b", "c", "follow"),
  ("c", "b", "follow"),
  ("f", "c", "follow"),
  ("e", "f", "follow"),
  ("e", "d", "friend"),
  ("d", "a", "friend"),
  ("a", "e", "friend")
)).toDF("src", "dst", "relationship")

val g = GraphFrame(v, e) 

在你的情况下:

// i remember your question on distinct, but not sure if we need ditinct or not
// you talk about RDD but looks like a dataframe, let us assume RDD

//RDD tuple, simulated from file
val rdd = sc.parallelize(Array(("p1", "p2", 1), 
                               ("p1", "p3", 2), 
                               ("p2", "p4", 3), 
                               ("p5", "p6", 4)))
val v = rdd.map(x => x._1).union(rdd.map(x => x._2)).distinct.toDF("protein")
v.collect
val e = rdd.map(x => (x._1, x._2, x._3)).toDF("protein1", "protein2", "similarity")

v.show(false)
e.show(false)

val g = GraphFrame(v, e) 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM