简体   繁体   English

如何使用Scala编译spark-cassandra程序?

[英]How to compile a spark-cassandra programs using scala?

Lately I started learning spark and cassandra, I know that we can use spark in both python and scala and java, and I 've read docs on this website: https://github.com/datastax/spark-cassandra-connector/blob/master/doc/0_quick_start.md , the thing is, after I create a program named testfile.scala with those codes the document says,(I don't know if I am right using .scala), however, i don't know how to compile it,can anyone guide me what to do with it? 最近,我开始学习spark和cassandra,我知道我们可以在python和scala和java中使用spark,并且我已经在以下网站上阅读了文档: https : //github.com/datastax/spark-cassandra-connector/blob /master/doc/0_quick_start.md ,问题是,在我用这些代码创建了一个名为testfile.scala的程序后,文件说(我不知道我是否正确使用.scala),但是,我没有知道如何编译它,有人可以指导我怎么做吗? Here are the testfile.scala: 这是testfile.scala:

import com.datastax.spark.connector._
import com.datastax.spark.connector.streaming._


val conf = new SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")

val sc = new SparkContext("spark://127.0.0.1:7077", "test", conf)

val ssc = new StreamingContext(conf, Seconds(n))

val stream = ssc.actorStream[String](Props[SimpleStreamingActor], actorName,          StorageLevel.MEMORY_AND_DISK)

val wc = stream.flatMap(_.split("\\s+")).map(x => (x, 1)).reduceByKey(_ + _).saveToCassandra("streaming_test", "words", SomeColumns("word", "count"))

val rdd = sc.cassandraTable("test", "kv")

println(rdd.count)

println(rdd.first)

println(rdd.map(_.getInt("value")).sum)

Scala projects are compiled by scalac, but it's quite low level: you have to setup build paths and manage all dependencies, so most people fall back to some build tool such as sbt which will manage a lot of stuff for you. Scala项目是由scalac编译的,但是它的级别很低:您必须设置构建路径并管理所有依赖项,因此大多数人会回过头来使用诸如sbt之类的构建工具 ,它将为您管理很多东西。 The other two commonly used built tools are maven , which is favored by java old-schoolers and gradle , which is more down to earth 另外两个常用的构建工具是maven ,这是Java老派的青睐,而gradle则更加扎根

> how to import spark-cassandra-connector >如何导入spark-cassandra-connector

I've set up example project . 我已经建立了示例项目 Basically, you define all of your dependencies in built.sbt or it's analog, here is how dependency on spark-cassandra-connector is defined (line #12). 基本上,您可以在built.sbt或类似的代码中定义所有依赖项, 这是如何定义对spark-cassandra-connector的依赖项 (第12行)。

> And, is it a rule that we have to code with class or object >而且,这是我们必须使用类或对象进行编码的规则吗

Yes and no. 是的,没有。 If you code with sbt, all your code files has to be wrapped into object, but, sbt allows you to code in it's shell and code that you input to it is not required to be wrapped (same rules as with ordinary scala REPL). 如果使用sbt进行编码,则必须将所有代码文件包装到对象中,但是sbt允许您在外壳中进行编码,并且不需要包装您输入的代码(与普通scala REPL相同的规则)。 Next, both IDEA and Eclipse have worksheet capabilities , so you can create test.sc and draft your code there. 接下来,IDEA和Eclipse都具有工作表功能 ,因此您可以创建test.sc并在那里编写代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM