[英]spark scala "Overloaded method value select with alternatives" when trying to get the max value
df.show
+-------+-----+----------+--------+------------+----+
| id| val| date| time| use |flag|
+-------+-----+----------+--------+------------+----+
|8200732| 1|2015-01-06|11:48:30|30065.221532| 0|
|8200733| 1|2015-01-06|11:48:40|30065.225763| 0|
|8200734| 1|2015-01-06|11:48:50|30065.229994| 0|
|8200735| 1|2015-01-06|11:49:00|30065.234225| 0|
I am trying to get the average use for each date value.我正在尝试获取每个日期值的平均使用情况。 Here is what I try:
这是我的尝试:
df.select("date",max($"use")).show()
<console>:26: error: overloaded method value select with alternatives:
[U1, U2](c1: org.apache.spark.sql.TypedColumn[org.apache.spark.sql.Row,U1], c2: org.apache.spark.sql.TypedColumn[org.apache.spark.sql.Row,U2])org.apache.spark.sql.Dataset[(U1, U2)] <and>
(col: String,cols: String*)org.apache.spark.sql.DataFrame <and>
(cols: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
cannot be applied to (String, org.apache.spark.sql.Column)
I am not sure what I am doing wrong, I have tried to re-write this many times but each time I get an error.我不确定自己做错了什么,我曾多次尝试重写,但每次都出现错误。 I can get the max value for just the use but trying to get the max value of use for each date is causing me issues.
我可以获得仅使用的最大值,但尝试获取每个日期的最大使用值给我带来了问题。
I can not use SparkSQL or pySpark for this.我不能为此使用 SparkSQL 或 pySpark。
That's because you're not using either of the overloaded options for select
method on dataframe. The one that you're using is:那是因为您没有在 dataframe 上使用
select
方法的任何一个重载选项。您使用的是:
df.select("date",max($"use")).show()
And if you notice, "date"
is a String literal, while max($"user")
is a Column
.如果您注意到,
"date"
是一个 String 文字,而max($"user")
是一个Column
。 You should try to use the date column instead of literal date string:您应该尝试使用日期列而不是文字日期字符串:
// notice the $ before date here
df.select($"date",max($"use")).show()
Here is what you should do to get the average use for each date value:以下是获取每个日期值的平均使用量应该执行的操作:
df.groupBy("date").agg(mean("use")).show()
To get max like in the question:要获得问题中的最大值:
df.groupBy("date").agg(max("use")).show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.