[英]Manipulation on Spark Dataframe row
我是Spark,Scala等新手,下面是我的代碼
val eventdf = sqlContext.sql("SELECT sensor, data.actor FROM eventTable")
eventdf.map {
case (r) => (r.getString(0) + count, r.getString(1), count)
}.saveToCassandra("caliper", "event", SomeColumns("sensor", "sendtime", "count"))
在這里,我想用r.getString(1)
執行一些操作,然后傳遞給cassandra保存。
如果您不能將轉換直接應用於dataframe列,則可以提出以下建議:
import org.apache.spark.sql.Row
import sqlContext.implicits._
val newRDD = eventdf.map {
case Row(val1: String, val2: String) =>
// process val2 here and save the result to val2_processed
(val1 + count, val2_processed, count)
}
val newDF = newRDD.toDF("col1", "col2", "col3") // If you need to convert it back to DF
newDF.saveToCassandra(...)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.