[英]Spark SQL - Update DataFrame Rows/Column values without converting as RDD
How do i update Spark SQL
DataFrame
Rows/Columns values without converting as RDD
? 如何在不转换为
RDD
情况下更新Spark SQL
DataFrame
行/列的值?
Why we cannot directly update the DataFrame like RDD
and returns another one. 为什么我们不能像
RDD
那样直接更新DataFrame并返回另一个。
RDDs are immutable and a 'transformation' on an RDD can only generates a new RDD. RDD是不可变的,并且RDD上的“转换”只能生成新的RDD。 Dataframes are wrappers around RDD and as such suffer from said immutablility for ex:
数据帧是RDD的包装,因此存在上述不变性,例如:
oldDF.registerTempTable("whatever")
val newDF = sqlContext.sql("select field1,field2,sum(field3) as times from whatever where substring(field1,1,4)='test' group by field1,field2 having times>100").collect().saveAsParquetFile("xxx.parquet")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.