![](/img/trans.png)
[英]Can't query Spark DF from Hive after `saveAsTable` - Spark SQL specific format, which is NOT compatible with Hive
[英]moving transformations from hive sql query to Spark
val temp = sqlContext.sql(s"SELECT A, B, C, (CASE WHEN (D) in (1,2,3) THEN ((E)+0.000)/60 ELSE 0 END) AS Z from TEST.TEST_TABLE")
val temp1 = temp.map({ temp => ((temp.getShort(0), temp.getString(1)), (USAGE_TEMP.getDouble(2), USAGE_TEMP.getDouble(3)))})
.reduceByKey((x, y) => ((x._1+y._1),(x._2+y._2)))
而不是上面的代碼在蜂巢層上進行計算(案例評估),我想在scala中完成轉換。 我該怎么辦?
在Map中填充數據時是否可以做同樣的事情?
val temp = sqlContext.sql(s"SELECT A, B, C, D, E from TEST.TEST_TABLE")
val tempTransform = temp.map(row => {
val z = List[Double](1, 2, 3).contains(row.getDouble(3)) match {
case true => row.getDouble(4) / 60
case _ => 0
}
Row(row.getShort(0), Row.getString(1), Row.getDouble(2), z)
})
val temp1 = tempTransform.map({ temp => ((temp.getShort(0), temp.getString(1)), (USAGE_TEMP.getDouble(2), USAGE_TEMP.getDouble(3)))})
.reduceByKey((x, y) => ((x._1+y._1),(x._2+y._2)))
您也可以使用此語法
new_df = old_df.withColumn('target_column', udf(df.name))
如本例所述
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._ // for `toDF` and $""
import org.apache.spark.sql.functions._ // for `when`
val df = sc.parallelize(Seq((4, "blah", 2), (2, "", 3), (56, "foo", 3), (100, null, 5)))
.toDF("A", "B", "C")
val newDf = df.withColumn("D", when($"B".isNull or $"B" === "", 0).otherwise(1))
在您的情況下,執行sql,其數據幀如下val temp = sqlContext.sql(s"SELECT A, B, C, D, E from TEST.TEST_TABLE")
並應用withColumn
與案件或when
otherwise
或在必要火花udf
,請調用scala函數邏輯而不是hiveudf
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.