简体   繁体   中英

Find minimum and maximum of year and month in spark scala

I would like to find minimum of year and month and maximum of year and month from spark dataframe. Below is my dataframe

code  year  month 
xx    2004  1
xx    2004  2
xxx   2004  3
xx    2004  6
xx    2011  12
xx    2018  10 

I want minimum month and Year as 2004-1 and maximum month and year as 2018-10

The solution which i tried is

  val minAnMaxYearAndMonth = dataSet.agg(min(Year),max(Month)).head()
val minYear = minAnMaxYearAndMonth(0)
val maxYear = minAnMaxYearAndMonth(1)
val minMonth = dataSet.select(Month).where(col(Year)  === minYear).take(1)
val maxMonth = dataSet.select(Month).where(col(Year)  === maxYear).take(1)

getting minYear and MaxYear but not min and max Month. Please help

You could use struct to make tuples out of years and months and then rely on tuple ordering. Tuples are ordered primarily by the leftmost component and then using next component as a tie-breaker.

df.select(struct("year", "month") as "ym")
  .agg(min("ym") as "min", max("ym") as "max")
  .selectExpr("stack(2, 'min', min.*, 'max', max.*) as (agg, year, month)")
  .show()

Output:

+---+----+-----+
|agg|year|month|
+---+----+-----+
|min|2004|    1|
|max|2018|   10|
+---+----+-----+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM