[英]How can I build graph (using Graphx) from Spark Dataframe?
I have already created a Spark DataFrame in order to build graph by Graphx which is Spark's API and accepts Spark Dataframe format.我已经创建了一个 Spark DataFrame,以便通过 Graphx 构建图形,Graphx 是 Spark 的 API 并接受 Spark Dataframe 格式。 So, now I have such a data,
所以,现在我有这样的数据,
+--------------------+----------------+------+
| hotel_url| author|rating|
+--------------------+----------------+------+
|Hotel_Review-g194...| violettaf340| 5|
|Hotel_Review-g194...| Lagaiuzza| 5|
|Hotel_Review-g194...| ashleyn763| 5|
|Hotel_Review-g194...| DavideMauro| 5|
|Hotel_Review-g194...| Alemma11| 4|
|Hotel_Review-g194...| ladispoli| 4|
|Hotel_Review-g303...| LiliT0URS| 3|
|Hotel_Review-g303...| Amandainldn| 4|
|Hotel_Review-g303...|TwoMonkeysTravel| 5|
|Hotel_Review-g303...| BiancaB3358| 4|
|Hotel_Review-g303...| Brett-Sweden| 4|
|Hotel_Review-g303...| analuizade| 5|
|Hotel_Review-g303...| heckfy| 5|
|Hotel_Review-g303...| MatheusMedrado| 3|
|Hotel_Review-g303...|TwoMonkeysTravel| 5|
|Hotel_Review-g303...| SaStar| 4|
|Hotel_Review-g303...| chrisbG2838DY| 4|
|Hotel_Review-g303...| virninha| 5|
|Hotel_Review-g303...| AugustusC_13| 5|
|Hotel_Review-g303...| AnnaMir| 5|
+--------------------+----------------+------+
and I would like to ask you that how to create a graph which has [ (Node: hotel_url) --- (weight: rating) --- (Node: author)] such type of relationship from the Spark Dataframe?我想问你,如何从 Spark Dataframe 创建一个具有 [ (Node: hotel_url) --- (weight: rating) --- (Node: author)] 这种类型关系的图表?
You can also understand desired relationship from the given figure.您还可以从给定的图形中了解所需的关系。
import org.apache.spark.sql.SparkSession
import org.apache.spark.graphx.Edge
import org.apache.spark.sql.types._
import org.apache.spark.graphx.Graph
import org.apache.spark.sql.functions._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
import spark.implicits._
val data = List(
("Hotel_Review-g194...", "violettaf340", 5),
("Hotel_Review-g194...", "Lagaiuzza", 5),
("Hotel_Review-g194...", "ashleyn763", 5),
("Hotel_Review-g194...", "DavideMauro", 5),
("Hotel_Review-g194...", "Alemma11", 4),
("Hotel_Review-g194...", "ladispoli", 4),
("Hotel_Review-g303...", "LiliT0URS", 3),
("Hotel_Review-g303...", "Amandainldn", 4),
("Hotel_Review-g303...", "TwoMonkeysTravel", 5),
("Hotel_Review-g303...", "BiancaB3358", 4),
("Hotel_Review-g303...", "Brett-Sweden", 4),
("Hotel_Review-g303...", "analuizade", 5),
("Hotel_Review-g303...", "heckfy", 5),
("Hotel_Review-g303...", "MatheusMedrado", 3),
("Hotel_Review-g303...", "TwoMonkeysTravel", 5),
("Hotel_Review-g303...", "SaStar", 4),
("Hotel_Review-g303...", "chrisbG2838DY", 4),
("Hotel_Review-g303...", "virninha", 5),
("Hotel_Review-g303...", "AugustusC_13", 5),
("Hotel_Review-g303...", "AnnaMir", 5)
).toDF("hotel_url", "author", "rating")
val vertices: RDD[(VertexId, String)] = data
.select(explode(array(col("hotel_url"), col("author"))))
.dropDuplicates()
.rdd
.map(_.getAs[String](0))
.zipWithIndex
.map(_.swap)
val vertDF = vertices.toDF("id", "node")
val edges = data
.join(vertDF, data.col("hotel_url") === vertDF("node"))
.select('author, 'rating.cast(StringType), 'id as 'idS)
.join(vertDF, data("author") === vertDF("node"))
.rdd
.map(row =>
Edge(
row.getAs[Long]("idS"),
row.getAs[Long]("id"),
"rating: " + row.getAs[String]("rating")
)
)
val graph = Graph(vertices, edges)
graph.vertices.foreach(println _)
// (2,Amandainldn)
// (7,heckfy)
// (5,DavideMauro)
// (0,MatheusMedrado)
// (4,ashleyn763)
// (1,LiliT0URS)
// (3,chrisbG2838DY)
// (9,Brett-Sweden)
// (11,virninha)
// (12,BiancaB3358)
// (16,AnnaMir)
// (10,TwoMonkeysTravel)
// (6,SaStar)
// (17,AugustusC_13)
// (19,ladispoli)
// (20,Alemma11)
// (14,analuizade)
// (8,Lagaiuzza)
// (18,violettaf340)
// (15,Hotel_Review-g194...)
// (13,Hotel_Review-g303...)
graph.edges.foreach(println(_))
// Edge(13,0,rating: 3)
// Edge(13,1,rating: 3)
// Edge(13,3,rating: 4)
// Edge(15,4,rating: 5)
// Edge(13,2,rating: 4)
// Edge(13,12,rating: 4)
// Edge(13,10,rating: 5)
// Edge(13,10,rating: 5)
// Edge(15,8,rating: 5)
// Edge(15,5,rating: 5)
// Edge(13,9,rating: 4)
// Edge(13,6,rating: 4)
// Edge(13,11,rating: 5)
// Edge(15,18,rating: 5)
// Edge(13,14,rating: 5)
// Edge(13,16,rating: 5)
// Edge(15,19,rating: 4)
// Edge(13,17,rating: 5)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.