we have a table : Columns are like --
EmpUser UserDate Empname .....
User123 20220730 Rajesh (30 rows for July)
3434Use 20220625 Gopi .... (25 rows for June)
As per the Userdate column we have to generate multiple rows for that respective month
How can we make create spark sql to run in parallelize to handle faster (as we have 900 million records )
Assuming you are using Databricks, they are parallelising and optimising your query (using Photon / Tungsten+Catalyst)
Of course you can improve your query itself, but parallelising does Spark for you (it is essentially Sparks bread & butter).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.