[英]Clojure: Scala/Java interop issues for Spark Graphx
I am trying to use Spark/GraphX using Clojure & Flambo . 我正在尝试使用Clojure和Flambo使用Spark / GraphX 。
Here is the code I ended up with: 这是我最终得到的代码:
In the project.clj
file: 在project.clj
文件中:
(defproject spark-tests "0.1.0-SNAPSHOT"
:description "FIXME: write description"
:url "http://example.com/FIXME"
:license {:name "Eclipse Public License"
:url "http://www.eclipse.org/legal/epl-v10.html"}
:dependencies [[org.clojure/clojure "1.6.0"]
[yieldbot/flambo "0.5.0"]]
:main ^:skip-aot spark-tests.core
:target-path "target/%s"
:checksum :warn
:profiles {:dev {:aot [flambo.function]}
:uberjar {:aot :all}
:provided {:dependencies
[[org.apache.spark/spark-core_2.10 "1.3.0"]
[org.apache.spark/spark-core_2.10 "1.2.0"]
[org.apache.spark/spark-graphx_2.10 "1.2.0"]]}})
And then my Clojure core.clj
file: 然后我的Clojure core.clj
文件:
(ns spark-tests.core
(:require [flambo.conf :as conf]
[flambo.api :as f]
[flambo.tuple :as ft])
(:import (org.apache.spark.graphx Edge)
(org.apache.spark.graphx.impl GraphImpl)))
(defonce c (-> (conf/spark-conf)
(conf/master "local")
(conf/app-name "flame_princess")))
(defonce sc (f/spark-context c))
(def users (f/parallelize sc [(ft/tuple 3 ["rxin" "student"])
(ft/tuple 7 ["jgonzal" "postdoc"])
(ft/tuple 5 ["franklin" "prof"])]))
(defn edge
[source dest attr]
(new Edge (long source) (long dest) attr))
(def relationships (f/parallelize sc [(edge 3 7 "collab")
(edge 5 3 "advisor")]))
(def g (new GraphImpl users relationships))
When I run that code, I am getting the following error: 运行该代码时,出现以下错误:
1. Caused by java.lang.ClassCastException
Cannot cast org.apache.spark.api.java.JavaRDD to
scala.reflect.ClassTag
Class.java: 3258 java.lang.Class/cast
Reflector.java: 427 clojure.lang.Reflector/boxArg
Reflector.java: 460 clojure.lang.Reflector/boxArgs
Disclaimer: I have no Scala knowledge. 免责声明:我没有Scala知识。
Then I thought that it may be because Flambo
returns a JavaRDD when we use f/parallelize
. 然后我想可能是因为当我们使用f/parallelize
时, Flambo
返回了JavaRDD。 Then I tried to convert the JavaRDD into a simple RDD as used in the GraphX example: 然后,我尝试将JavaRDD转换为GraphX示例中使用的简单RDD:
(def g (new GraphImpl (.rdd users) (.rdd relationships)))
But the I am getting the same error but for the ParallelCollectionRDD
class... 但是我正在为ParallelCollectionRDD
类得到相同的错误...
From there, I am have idea of what may be causing this. 从那里,我对造成这种情况的原因有所了解。 The Java API for the Graph class is here , the Scala API for the same class is here . Graph类的Java API在这里 , 同一类的Scala API在这里 。
What I am not clear about is how to effectively use that class signature in Clojure: 我不清楚的是如何在Clojure中有效使用该类签名:
org.apache.spark.graphx.Graph<VD,ED>
(Graph is an abstract class, but I tried using GraphImpl in this example) (Graph是一个抽象类,但在此示例中我尝试使用GraphImpl)
What I am trying to do is to re-create that Scala example using Clojure. 我正在尝试使用Clojure 重新创建该Scala示例 。
Any hints would be highly appreciated! 任何提示将不胜感激!
Finally got it right (I think). 终于弄对了(我认为)。 Here is the code that appears to be working: 这是似乎起作用的代码:
(ns spark-tests.core
(:require [flambo.conf :as conf]
[flambo.api :as f]
[flambo.tuple :as ft])
(:import (org.apache.spark.graphx Edge
Graph)
(org.apache.spark.api.java JavaRDD
StorageLevels)
(scala.reflect ClassTag$)))
(defonce c (-> (conf/spark-conf)
(conf/master "local")
(conf/app-name "flame_princess")))
(defonce sc (f/spark-context c))
(def users (f/parallelize sc [(ft/tuple 3 ["rxin" "student"])
(ft/tuple 7 ["jgonzal" "postdoc"])
(ft/tuple 5 ["franklin" "prof"])]))
(defn edge
[source dest attr]
(new Edge (long source) (long dest) attr))
(def relationships (f/parallelize sc [(edge 3 7 "collab")
(edge 5 3 "advisor")
(edge 7 3 "advisor")]))
(def g (Graph/apply (.rdd users)
(.rdd relationships)
"collab"
(StorageLevels/MEMORY_ONLY)
(StorageLevels/MEMORY_ONLY)
(.apply ClassTag$/MODULE$ clojure.lang.PersistentVector)
(.apply ClassTag$/MODULE$ java.lang.String)))
(println (.count (.edges g)))
What this code returns is 3
which seems to be exact. 该代码返回的是3
,这似乎是正确的。 The main issue was that I was not creating the class using Graph/Apply
. 主要问题是我没有使用Graph/Apply
创建类。 In fact, it appears that this is the way to create all the objects (looks to be the constructor...). 实际上,这似乎是创建所有对象的方式(看起来是构造函数...)。 I have no idea why this is what way, but this is probably due to my lack of Scala knowledge. 我不知道为什么会这样,但这可能是由于我缺乏Scala知识。 If anybody knows, just tell me why :) 如果有人知道,请告诉我为什么:)
After that I only had to fill-in the gaps for the signature of the apply
function. 在那之后,我只需要填补apply
函数签名的空白即可。
One thing to note are the last two parameters: 需要注意的一件事是最后两个参数:
scala.reflect.ClassTag<VD> evidence$17
scala.reflect.ClassTag<ED> evidence$18
This is used to instruct Scala of the vertex attribute type
( VD ) and the edge attribute type
( ED ). 这用于指示Scala vertex attribute type
( VD )和edge attribute type
( ED )。 The type of ED
is the type of the object I used as the third parameter of the Edge
class. ED
的类型是我用作Edge
类的第三个参数的对象的类型。 Then the type of VD
is the type of the second parameter of the tuple
function. 那么, VD
的类型就是tuple
函数的第二个参数的类型。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.