[英]can you use spark-shell programmatically
is it possible to run a spark-shell from a java or scala program? 是否可以从java或scala程序运行spark-shell? another words, start a spark-shell session inside a java program, pass spark code to it and read back the response, and continue the interaction inside the code.
换句话说,在java程序中启动spark-shell会话,将spark代码传递给它并读回响应,并继续代码中的交互。
If you want to use spark-shell you can always call it from java and then capture its stdin and stdout to pass text and get responses. 如果你想使用spark-shell,你总是可以从java调用它,然后捕获它的stdin和stdout来传递文本并获得响应。
OutputStream stdin = null;
InputStream stderr = null;
InputStream stdout = null;
Process process = Runtime.getRuntime ().exec ("spark-shell");
stdin = process.getOutputStream ();
stderr = process.getErrorStream ();
stdout = process.getInputStream ();
But there is actually no reason doing so. 但实际上没有理由这样做。 Spark-Shell is mostly for learning and testing.
Spark-Shell主要用于学习和测试。 Everything you can do from the shell you can do it from a Java app, even interactively.
您可以从shell执行的所有操作都可以从Java应用程序执行,甚至可以通过交互方式执行。
Consider the following example: You want to count errors and if they are more than 100 ask user if he wants to display them at the console. 请考虑以下示例:您想要计算错误,如果它们超过100,请询问用户是否要在控制台上显示它们。 If they are less than 100 display them anyway:
如果它们小于100,则无论如何都要显示它们:
JavaRDD<String> lines = sc.textFile("hdfs://log.txt").filter(s -> s.contains("error"));
if(lines.count() > 100)
{
System.out.println("Errors are more than 100 do you wish to display them? (y/n)");
BufferedReader br = new BufferedReader(new InputStreamReader(System.in));
if(br.readLine().equals("y"))
{
List<String> errors = lines.collect();
for(String s : errors)
System.out.println(s);
}
}
else
{
List<String> errors = lines.collect();
for(String s : errors)
System.out.println(s);
}
This is a working solution on top of Spark 1.6.0
and Scala 2.10
. 这是
Spark 1.6.0
和Scala 2.10
之上的工作解决方案。 Create SparkIMain
with Settings
and bind
the variables and values associated with types. 使用
Settings
创建SparkIMain
并bind
与类型关联的变量和值。
import org.apache.spark.repl.SparkIMain
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import scala.tools.nsc.GenericRunnerSettings
class TestMain {
def exec(): Unit = {
val settings = new GenericRunnerSettings( println _ )
settings.usejavacp.value = true
val interpreter = new SparkIMain(settings)
val conf = new SparkConf().setAppName("TestMain").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val methodChain =
"""
val df = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "false")
.option("inferSchema", "true")
.option("treatEmptyValuesAsNulls", "true")
.option("parserLib", "univocity")
.load("example-data.csv")
df.show()
"""
interpreter.bind("sqlContext" ,"org.apache.spark.sql.SQLContext", sqlContext)
val resultFlag = interpreter.interpret(methodChain)
}
}
object TestInterpreter{
def main(args: Array[String]) {
val testMain = new TestMain()
testMain.exec()
System.exit(0)
}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.