简体   繁体   中英

Spark SQL - Update DataFrame Rows/Column values without converting as RDD

How do i update Spark SQL DataFrame Rows/Columns values without converting as RDD ?

Why we cannot directly update the DataFrame like RDD and returns another one.

RDDs are immutable and a 'transformation' on an RDD can only generates a new RDD. Dataframes are wrappers around RDD and as such suffer from said immutablility for ex:

oldDF.registerTempTable("whatever") 

val newDF = sqlContext.sql("select field1,field2,sum(field3) as times from whatever where substring(field1,1,4)='test' group by field1,field2 having times>100").collect().saveAsParquetFile("xxx.parquet")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM