[英]spark dataframe filter and select
I have a spark scala dataframe and need to filter the elements based on condition and select the count.我有一个火花 scala dataframe,需要根据条件和 select 计数过滤元素。
val filter = df.groupBy("user").count().alias("cnt")
val **count** = filter.filter(col("user") === ("subscriber").select("cnt")
The error i am facing is value select is not a member of org.apache.spark.sql.Column Also for some reasons count is Dataset[Row] Any thoughts to get the count in a single line?我面临的错误是值 select 不是 org.apache.spark.sql.Column 的成员此外,由于某些原因,count 是Dataset[Row]有没有想过在一行中获取计数?
DataSet[Row]
is DataFrame
数据集
DataSet[Row]
是DataFrame
RDD[Row]
is DataFrame
so no need to worry.. its dataframe RDD[Row]
是DataFrame
所以不用担心..它是 dataframe
see this for better understanding... Difference between DataFrame, Dataset, and RDD in Spark看到这个可以更好地理解...... DataFrame、数据集和 Spark 中的 RDD 之间的区别
Regarding select is not a member of org.apache.spark.sql.Column
its purely compile error.关于
select is not a member of org.apache.spark.sql.Column
它纯粹是编译错误。
val filter = df.groupBy("user").count().alias("cnt")
val count = filter.filter (col("user") === ("subscriber"))
.select("cnt")
will work since you are missing ) braces which is closing brace for filter.将会工作,因为你缺少 ) 大括号,它是过滤器的右大括号。
You are missing ")" before.select, Please check below code.您之前缺少“)”。select,请检查以下代码。
Column class don't have.select method, you have to invoke select on Dataframe.列 class 没有.select 方法,您必须在 Dataframe 上调用 select。
val filter = df.groupBy("user").count().alias("cnt")
val **count** = filter.filter(col("user") === "subscriber").select("cnt")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.