I would like to find minimum of year and month and maximum of year and month from spark dataframe. Below is my dataframe
code year month
xx 2004 1
xx 2004 2
xxx 2004 3
xx 2004 6
xx 2011 12
xx 2018 10
I want minimum month and Year as 2004-1 and maximum month and year as 2018-10
The solution which i tried is
val minAnMaxYearAndMonth = dataSet.agg(min(Year),max(Month)).head()
val minYear = minAnMaxYearAndMonth(0)
val maxYear = minAnMaxYearAndMonth(1)
val minMonth = dataSet.select(Month).where(col(Year) === minYear).take(1)
val maxMonth = dataSet.select(Month).where(col(Year) === maxYear).take(1)
getting minYear and MaxYear but not min and max Month. Please help
You could use struct
to make tuples out of years and months and then rely on tuple ordering. Tuples are ordered primarily by the leftmost component and then using next component as a tie-breaker.
df.select(struct("year", "month") as "ym")
.agg(min("ym") as "min", max("ym") as "max")
.selectExpr("stack(2, 'min', min.*, 'max', max.*) as (agg, year, month)")
.show()
Output:
+---+----+-----+
|agg|year|month|
+---+----+-----+
|min|2004| 1|
|max|2018| 10|
+---+----+-----+
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.