简体   繁体   中英

spark sql generate multiple rows based on one row on date column

we have a table : Columns are like --

EmpUser   UserDate Empname .....
User123   20220730 Rajesh      (30 rows for July)
3434Use   20220625 Gopi ....   (25 rows for June)

As per the Userdate column we have to generate multiple rows for that respective month

How can we make create spark sql to run in parallelize to handle faster (as we have 900 million records )

Assuming you are using Databricks, they are parallelising and optimising your query (using Photon / Tungsten+Catalyst)

Of course you can improve your query itself, but parallelising does Spark for you (it is essentially Sparks bread & butter).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM