In Spark 1.6, Basically I would like to apply partition by and then do order by using two columns so that I can apply rank logic for each partition
val str = "insertdatetime,a_load_dt"
val orderByList = str.split(",")
val ptr = "memberidnum"
val partitionsColumnsList = ptr.split(",").toList
val landingDF = hc.sql("""select memberidnum,insertdatetime,'2019-09-26' as a_load_dt from landing_omega.omegamaster""")
val stagingDF = hc.sql("""select memberidnum,insertdatetime,a_load_dt from staging_omega.omegamaster where recordstatus ='current'""")
val unionedDF = landingDF.unionAll(stagingDF)
unionedDF.registerTempTable("temp_table")
val windowFunction = Window.partitionBy(partitionsColumnsList.map(elem => col(elem)):_*).orderBy(unionedDF(orderByList(0),orderByList(1)).desc)
But it throws the below error
scala> val windowFunction = Window.partitionBy(partitionsColumnsList.map(elem => col(elem)):_*).orderBy(unionedDF(orderByList(0),orderByList(1)).desc)
<console>:56: error: too many arguments for method apply: (colName: String)org.apache.spark.sql.Column in class DataFrame
val windowFunction = Window.partitionBy(partitionsColumnsList.map(elem => col(elem)):_*).orderBy(unionedDF(orderByList(0),orderByList(1)).desc)
How do I fix this issue. I want to apply order by on two columns desc order
Please help ^
You can simply do the below change:
val windowFunction = Window.partitionBy(partitionsColumnsList.head, partitionsColumnsList.tail:_*).orderBy(unionedDF(orderByList(0),orderByList(1)).desc)
You can use the below snippet:
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.expressions.Window
Window.partitionBy(partitionsColumnsList.map(col): _*)
.orderBy(array_union(orderByList.map(col): _*).desc)
If this did not work. Please let me know.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.