简体   繁体   English

火花 dataframe 过滤器和 select

[英]spark dataframe filter and select

I have a spark scala dataframe and need to filter the elements based on condition and select the count.我有一个火花 scala dataframe,需要根据条件和 select 计数过滤元素。

  val filter = df.groupBy("user").count().alias("cnt")
  val **count** = filter.filter(col("user") === ("subscriber").select("cnt")

The error i am facing is value select is not a member of org.apache.spark.sql.Column Also for some reasons count is Dataset[Row] Any thoughts to get the count in a single line?我面临的错误是值 select 不是 org.apache.spark.sql.Column 的成员此外,由于某些原因,count 是Dataset[Row]有没有想过在一行中获取计数?

DataSet[Row] is DataFrame数据集DataSet[Row]DataFrame

RDD[Row] is DataFrame so no need to worry.. its dataframe RDD[Row]DataFrame所以不用担心..它是 dataframe

see this for better understanding... Difference between DataFrame, Dataset, and RDD in Spark看到这个可以更好地理解...... DataFrame、数据集和 Spark 中的 RDD 之间的区别

Regarding select is not a member of org.apache.spark.sql.Column its purely compile error.关于select is not a member of org.apache.spark.sql.Column它纯粹是编译错误。

 val filter = df.groupBy("user").count().alias("cnt")
  val count = filter.filter (col("user") === ("subscriber"))
    .select("cnt")

will work since you are missing ) braces which is closing brace for filter.将会工作,因为你缺少 ) 大括号,它是过滤器的右大括号。

You are missing ")" before.select, Please check below code.您之前缺少“)”。select,请检查以下代码。

Column class don't have.select method, you have to invoke select on Dataframe.列 class 没有.select 方法,您必须在 Dataframe 上调用 select。

val filter = df.groupBy("user").count().alias("cnt")
  val **count** = filter.filter(col("user") === "subscriber").select("cnt")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM