简体   繁体   English

Scala Spark Cassandra安装

[英]scala spark cassandra installation

  1. How many ways are there to run Spark? 有多少种方法可以运行Spark? If I just declare dependencies in build.sbt, Spark is supposed to be downloaded and works? 如果仅在build.sbt中声明依赖项,则应该下载Spark并运行它? But if I want to run Spark locally (download the Spark tar file, winutils...), how can I specify in scala code that I want to run my code against the local Spark and not against the dependencies downloaded in the IntelliJ? 但是,如果我想在本地运行Spark(下载Spark tar文件,winutils ...),如何在scala代码中指定要针对本地Spark运行代码而不是针对IntelliJ中下载的依赖项运行代码?

  2. In order to connect Spark to Cassandra, do I need a local installation of Spark? 为了将Spark连接到Cassandra,我是否需要在本地安装Spark? I read somewhere it's not possible to connect from a "programmatically" Spark to a local Cassandra database 我读到某处无法从“以编程方式” Spark连接到本地Cassandra数据库

1) Spark runs in a slightly strange way, there is your application (the Spark Driver and Executors) and there is the Resource Manager (Spark Master/Workers, Yarn, Mesos or Local). 1)Spark以一种有点奇怪的方式运行,有您的应用程序(Spark驱动程序和执行器),还有资源管理器(Spark Master / Workers,Yarn,Mesos或Local)。

In your code you can run against the in process manager (local) by specifying the master as local or local[n] . 在您的代码中,您可以通过将master指定为locallocal[n]来针对进程内管理器(本地)运行。 The Local mode requires no installation of Spark as it will be automatically setup in the process you are running. 本地模式不需要安装Spark,因为它将在您正在运行的过程中自动设置。 This would be using the dependencies you downloaded. 这将使用您下载的依赖项。

To run against a Spark Master which is running locally, you use a spark:// url that points at your particular local Spark Master instance. 要针对在本地运行的Spark Master运行,请使用指向您本地特定Spark Master实例的spark://网址。 Note that this will cause executor JVMS to start separate from your application necessitating the distribution of application code and dependencies. 请注意,这将使执行程序JVMS与您的应用程序分开启动,从而需要分发应用程序代码和依赖项。 (Other resource managers have there own identifying urls) (其他资源管理器有自己的标识URL)

2) You do not need a "Resource Manager" to connect to C* from Spark but this ability is basically for debugging and testing. 2)您不需要“资源管理器”即可从Spark连接到C *,但是此功能基本上是用于调试和测试的。 To do this you would use the local master url. 为此,您将使用local主URL。 Normal Spark usage should have an external Resource Manager because without an external resource manager the system cannot be distributed. 正常使用Spark时应使用外部资源管理器,因为如果没有外部资源管理器,则无法分发系统。

For some more Spark Cassandra examples see 有关更多Spark Spark Cassandra示例,请参见

https://github.com/datastax/SparkBuildExamples https://github.com/datastax/SparkBuildExamples

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM