[英]how to Connect to NEO4J in Spark worker nodes?
I need to get a small subgraph in a spark map function. 我需要在火花映射函数中获得一个小的子图。 I have tried to use AnormCypher and NEO4J-SPARK-CONNECTOR, but neither works.
我曾尝试使用AnormCypher和NEO4J-SPARK-CONNECTOR,但均无效。 AnormCypher will lead to a java IOException Error (I build the connection in a mapPartition function, test at localhost server).
AnormCypher将导致Java IOException错误(我在mapPartition函数中建立连接,在本地服务器上测试)。 And Neo4j-spark-connector will cause TASK NOT SERIALIZABLE EXCEPTION below.
而且Neo4j-spark-connector将导致下面的任务“不可序列化例外”。
Is there a good way to get a subgraph(or connect to graph data base like neo4j) in the Spark worker node? 是否有一种好方法可以在Spark worker节点中获取子图(或连接至图数据库,如neo4j)?
Exception in thread "main" org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:298)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2094)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:793)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1.apply(RDD.scala:792)
at ....
my code snippet using neo4j-spark-connector 2.0.0-m2: 我的代码段使用neo4j-spark-connector 2.0.0-m2:
val neo = Neo4j(sc) // this runs on the driver
// this runs by a map function
def someFunctionToBeMapped(p: List[Long]) = {
val metaGraph = neo.cypher("match p = (a:TourPlace) -[r:could_go_to] -> (b:TourPlace)" +
"return a.id ,r.distance, b.id").loadRowRdd.map( row => ((row(0).asInstanceOf[Long],row(2).asInstanceOf[Long]), row(1).asInstanceOf[Double]) ).collect().toList
The AnromCypher code is : AnromCypher代码为:
def partitionMap(partition: Iterator[List[Long]]) = {
import org.anormcypher._
import play.api.libs.ws._
// Provide an instance of WSClient
val wsclient = ning.NingWSClient()
// Setup the Rest Client
// Need to add the Neo4jConnection type annotation so that the default
// Neo4jConnection -> Neo4jTransaction conversion is in the implicit scope
implicit val connection: Neo4jConnection = Neo4jREST("127.0.0.1", 7474, "neo4j", "000000")(wsclient)
//
// Provide an ExecutionContext
implicit val ec = scala.concurrent.ExecutionContext.global
val res = partition.filter( placeList => {
val startPlace = Cypher("match p = (a:TourPlace) -[r:could_go_to] -> (b:TourPlace)" +
"return p")().flatMap( row => row.data )
})
wsclient.close()
res
}
I have used spark standalone mode and able to connect neo4j database 我已使用Spark独立模式并能够连接neo4j数据库
Version used : 使用的版本:
spark 2.1.0 火花2.1.0
neo4j-spark-connector 2.1.0-m2 neo4j-spark-connector 2.1.0-m2
My code:- 我的代码:-
val sparkConf = new SparkConf().setAppName("Neo$j").setMaster("local")
val sc = new SparkContext(sparkConf)
println("***Getting Started ****")
val neo = Neo4j(sc)
val rdd = neo.cypher("MATCH (n) RETURN id(n) as id").loadDataFrame
println(rdd.count)
Spark submit:- spark-submit --class package.classname --jars pathofneo4jsparkconnectoryJAR --conf spark.neo4j.bolt.password=***** targetJarFile.jar Spark提交:-spark-submit --class package.classname --jars pathofneo4jsparkconnectoryJAR --conf spark.neo4j.bolt.password = ***** targetJarFile.jar
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.