你能以编程方式使用spark-shell吗？

Question

is it possible to run a spark-shell from a java or scala program? 是否可以从java或scala程序运行spark-shell？ another words, start a spark-shell session inside a java program, pass spark code to it and read back the response, and continue the interaction inside the code. 换句话说，在java程序中启动spark-shell会话，将spark代码传递给它并读回响应，并继续代码中的交互。

Answer 1

If you want to use spark-shell you can always call it from java and then capture its stdin and stdout to pass text and get responses. 如果你想使用spark-shell，你总是可以从java调用它，然后捕获它的stdin和stdout来传递文本并获得响应。

OutputStream stdin = null;
InputStream stderr = null;
InputStream stdout = null;

Process process = Runtime.getRuntime ().exec ("spark-shell");
stdin = process.getOutputStream ();
stderr = process.getErrorStream ();
stdout = process.getInputStream ();

But there is actually no reason doing so. 但实际上没有理由这样做。 Spark-Shell is mostly for learning and testing. Spark-Shell主要用于学习和测试。 Everything you can do from the shell you can do it from a Java app, even interactively. 您可以从shell执行的所有操作都可以从Java应用程序执行，甚至可以通过交互方式执行。

Consider the following example: You want to count errors and if they are more than 100 ask user if he wants to display them at the console. 请考虑以下示例：您想要计算错误，如果它们超过100，请询问用户是否要在控制台上显示它们。 If they are less than 100 display them anyway: 如果它们小于100，则无论如何都要显示它们：

JavaRDD<String> lines = sc.textFile("hdfs://log.txt").filter(s -> s.contains("error"));
if(lines.count() > 100)
{
    System.out.println("Errors are more than 100 do you wish to display them? (y/n)");

    BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
    if(br.readLine().equals("y"))
    {
        List<String> errors = lines.collect();
        for(String s : errors)
            System.out.println(s);
    }
}
else
{
    List<String> errors = lines.collect();
    for(String s : errors)
        System.out.println(s);
}

Answer 2

This is a working solution on top of Spark 1.6.0 and Scala 2.10 . 这是Spark 1.6.0和Scala 2.10之上的工作解决方案。 Create SparkIMain with Settings and bind the variables and values associated with types. 使用Settings创建SparkIMain并bind与类型关联的变量和值。

import org.apache.spark.repl.SparkIMain
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}

import scala.tools.nsc.GenericRunnerSettings
class TestMain {
  def exec(): Unit = {
    val settings = new GenericRunnerSettings( println _ )
        settings.usejavacp.value = true
    val interpreter = new SparkIMain(settings)

    val conf = new SparkConf().setAppName("TestMain").setMaster("local[*]")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)

    val methodChain =
      """
        val df = sqlContext.read
              .format("com.databricks.spark.csv")
              .option("header", "false")
              .option("inferSchema", "true")
              .option("treatEmptyValuesAsNulls", "true")
              .option("parserLib", "univocity")
              .load("example-data.csv")

        df.show()

      """
    interpreter.bind("sqlContext" ,"org.apache.spark.sql.SQLContext", sqlContext)
    val resultFlag = interpreter.interpret(methodChain)
  }
}

object TestInterpreter{

    def main(args: Array[String]) {
      val testMain = new TestMain()
      testMain.exec()
      System.exit(0)
    }}

你能以编程方式使用spark-shell吗？

问题描述

2 个解决方案

解决方案1
1 2015-11-05 12:36:55

解决方案2
1 2016-07-19 13:10:49

你能以编程方式使用spark-shell吗？

问题描述

2 个解决方案

解决方案1 1 2015-11-05 12:36:55

解决方案2 1 2016-07-19 13:10:49

解决方案1
1 2015-11-05 12:36:55

解决方案2
1 2016-07-19 13:10:49