简体   繁体   English

如何将hiveContext作为参数传递给函数spark scala

[英]How to pass hiveContext as argument to functions spark scala

I have created a hiveContext in main() function in Scala and I need to pass through parameters this hiveContext to other functions, this is the structure: 我在Scala的main()函数中创建了一个hiveContext ,我需要将此hiveContext参数hiveContext给其他函数,这是结构:

object Project {
    def main(name: String): Int = {
      val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc)
      ... 
    } 
    def read (streamId: Int, hc:hiveContext): Array[Byte] = {
    ... 
    } 
    def close (): Unit = {
    ...
    }
 }

but it doesn't work. 但这不起作用。 Function read() is called inside main() . 函数read()main()内部调用。

any idea? 任何想法?

I'm declaring hiveContext as implicit, this is working for me 我将hiveContext声明为隐式,这对我有用

implicit val sqlContext: HiveContext = new HiveContext(sc)
MyJob.run(conf)

Defined in MyJob: 在MyJob中定义:

override def run(config: Config)(implicit sqlContext: SQLContext): Unit = ...

But if you don't want it implicit, this should be the same 但是,如果您不希望它隐式,则应该相同

val sqlContext: HiveContext = new HiveContext(sc)
MyJob.run(conf)(sqlContext)

override def run(config: Config)(sqlContext: SQLContext): Unit = ...

Also, your function read should receive HiveContext as the type for the parameter hc, and not hiveContext 另外,您读取的函数应接收HiveContext作为参数hc的类型,而不是hiveContext

def read (streamId: Int, hc:HiveContext): Array[Byte] = 

I tried several options, this is what worked eventually for me.. 我尝试了几种选择,这最终对我有用。

object SomeName extends App {

val conf = new SparkConf()...
val sc = new SparkContext(conf)

implicit val sqlC = SQLContext.getOrCreate(sc)
getDF1(sqlC)

def getDF1(sqlCo: SQLContext): Unit = {
    val query1 =  SomeQuery here  
    val df1 = sqlCo.read.format("jdbc").options(Map("url" -> dbUrl,"dbtable" -> query1)).load.cache()

 //iterate through df1 and retrieve the 2nd DataFrame based on some values in the Row of the first DataFrame

  df1.foreach(x => {
    getDF2(x.getString(0), x.getDecimal(1).toString, x.getDecimal(3).doubleValue) (sqlCo)
  })     
}

def getDF2(a: String, b: String, c: Double)(implicit sqlCont: SQLContext) :  Unit = {
  val query2 = Somequery

  val sqlcc = SQLContext.getOrCreate(sc)
  //val sqlcc = sqlCont //Did not work for me. Also, omitting (implicit sqlCont: SQLContext) altogether did not work
  val df2 = sqlcc.read.format("jdbc").options(Map("url" -> dbURL, "dbtable" -> query2)).load().cache()
   .
   .
   .
 }
}

Note: In the above code, if I omitted (implicit sqlCont: SQLContext) parameter from getDF2 method signature, it would not work. 注意:在上面的代码中,如果我从getDF2方法签名中省略了(隐式sqlCont:SQLContext)参数,它将无法正常工作。 I tried several other options of passing the sqlContext from one method to the other, it always gave me NullPointerException or Task not serializable Excpetion. 我尝试了将sqlContext从一种方法传递给另一种方法的其他几种选择,但它总是给我NullPointerException或Task not serializable Excpetion。 Good thins is it eventually worked this way, and I could retrieve parameters from a row of the DataFrame1 and use those values in loading the DataFrame 2. 好瘦了,它最终以这种方式工作了,我可以从DataFrame1的一行中检索参数,并在加载DataFrame 2时使用这些值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM