簡體   English   中英

帶Flink和Scala的neo4j

[英]neo4j with Flink and Scala

我正在使用Scala 2.11.7和Flink 1.3.2處理數據。 現在,我想將生成的org.apache.flink.api.scala.DataSet存儲在neo4j圖形數據庫中。

Github項目具有兼容性:

  • 與neo4j鏈接: https : //github.com/s1ck/flink-neo4j
  • 使用neo4j的Scala:_https://github.com/FaKod/neo4j-scala
  • Flink的帶有Neo4j的圖形庫“ Gelly”:_https://github.com/albertodelazzari/gelly-neo4j

最有前途的方式是什么? 還是應該直接使用neo4j的REST API?

(順便說一句:為什么stackoverflow限制了postet鏈接的數量...?)

我嘗試了flink-neo4j,但是混合Java和Scala類似乎存在一些問題:

package dummy.neo4j

import org.apache.flink.api.common.io.OutputFormat
import org.apache.flink.api.java.io.neo4j.Neo4jOutputFormat
import org.apache.flink.api.java.tuple.{Tuple, Tuple2}
import org.apache.flink.api.scala._

object Neo4jDummyWriter {

  def main(args: Array[String]) {
    val env = ExecutionEnvironment.getExecutionEnvironment

    val outputFormat: OutputFormat[_ <: Tuple] = Neo4jOutputFormat.buildNeo4jOutputFormat.setRestURI("http://localhost:7474/db/data/")
  .setConnectTimeout(1000).setReadTimeout(1000).setCypherQuery("UNWIND {inserts} AS i CREATE (a:User {name:i.name, born:i.born})")
  .addParameterKey(0, "name").addParameterKey(1, "born").setTaskBatchSize(1000).finish

    val tuple1: Tuple = new Tuple2("abc", 1)
    val tuple2: Tuple = new Tuple2("def", 2)

    val test = env.fromElements[Tuple](tuple1, tuple2)
    println("test: " + test.getClass)
    test.output(outputFormat)
  }

}

線程“主”中的異常java.lang.ClassCastException:[Ljava.lang.Object; 無法轉換為[Lorg.apache.flink.api.common.typeinfo.TypeInformation; 在dummy.neo4j.Neo4jDummyWriter $ .main(Neo4jDummyWriter.scala:20)在dummy.neo4j.Neo4jDummyWriter.main(Neo4jDummyWriter.scala)

類型不匹配,預期:OutputFormat [元組],實際:OutputFormat [_ <:元組]

解決方案是不將Tuple2對象更改為Tuple:

package dummy.neo4j

import org.apache.flink.api.common.io._
import org.apache.flink.api.java.io.neo4j.Neo4jOutputFormat
import org.apache.flink.api.java.tuple.Tuple2
import org.apache.flink.api.scala._

object Neo4jDummyWriter {

  def main(args: Array[String]) {
    val env = ExecutionEnvironment.getExecutionEnvironment

    val tuple1 = ("user9", 1978)
    val tuple2 = ("user10", 1996)
    val datasetWithScalaTuples = env.fromElements(tuple1, tuple2)
    val dataset: DataSet[Tuple2[String, Int]] = datasetWithScalaTuples.map(tuple => new Tuple2(tuple._1, tuple._2))

    val outputFormat = Neo4jOutputFormat.buildNeo4jOutputFormat.setRestURI("http://localhost:7474/db/data/").setUsername("neo4j").setPassword("...")
  .setConnectTimeout(1000).setReadTimeout(1000).setCypherQuery("UNWIND {inserts} AS i CREATE (a:User {name:i.name, born:i.born})")
  .addParameterKey(0, "name").addParameterKey(1, "born").setTaskBatchSize(1000).finish.asInstanceOf[OutputFormat[Tuple2[String, Int]]]

    dataset.output(outputFormat)
    env.execute
  }

}

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM