How do i update Spark SQL
DataFrame
Rows/Columns values without converting as RDD
?
Why we cannot directly update the DataFrame like RDD
and returns another one.
RDDs are immutable and a 'transformation' on an RDD can only generates a new RDD. Dataframes are wrappers around RDD and as such suffer from said immutablility for ex:
oldDF.registerTempTable("whatever")
val newDF = sqlContext.sql("select field1,field2,sum(field3) as times from whatever where substring(field1,1,4)='test' group by field1,field2 having times>100").collect().saveAsParquetFile("xxx.parquet")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.