简体   繁体   English

Spark SQL-更新DataFrame行/列值而不转换为RDD

[英]Spark SQL - Update DataFrame Rows/Column values without converting as RDD

How do i update Spark SQL DataFrame Rows/Columns values without converting as RDD ? 如何在不转换为RDD情况下更新Spark SQL DataFrame行/列的值?

Why we cannot directly update the DataFrame like RDD and returns another one. 为什么我们不能像RDD那样直接更新DataFrame并返回另一个。

RDDs are immutable and a 'transformation' on an RDD can only generates a new RDD. RDD是不可变的,并且RDD上的“转换”只能生成新的RDD。 Dataframes are wrappers around RDD and as such suffer from said immutablility for ex: 数据帧是RDD的包装,因此存在上述不变性,例如:

oldDF.registerTempTable("whatever") 

val newDF = sqlContext.sql("select field1,field2,sum(field3) as times from whatever where substring(field1,1,4)='test' group by field1,field2 having times>100").collect().saveAsParquetFile("xxx.parquet")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM