简体   繁体   中英

Dataframes to EdgeRDD (GraphX) using Scala api to Spark

Is there a nice way of going from a Spark DataFrame to an EdgeRDD without hardcoding types in the Scala code? The examples I've seen use case classes to define the type of the EdgeRDD .

Let's assume that our Spark DataFrame has StructField ("dstID", LongType, false) and ("srcID", LongType, false) and between 0 and 22 additional StructField (We are constraining this so that we can use a TupleN to represent them). Is there a clean way to define an EdgeRdd[TupleN] by grabbing the types from the DataFrame ? As motivation, consider that we are loading a Parquet file that contains type information.

I'm very new to Spark and Scala, so I realize the question may be misguided. In this case, I'd appreciate learning the "correct" way of thinking about this problem.

可能最简单的方法是映射到Dataframe中的Row对象(带有map )并以这种方式返回。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM