简体   繁体   中英

Spark pool taking time to start in azure synapse Analytics

I have created 3 different notebook using pyspark code in Azure synapse Analytics. Notebook is running using spark pool. There is only one spark pool for all 3 notebook. when these 3 notebook run individually, spark pool starts for all 3 notebook by default.

The issue which i am facing is related to spark pool. It is taking 10 minutes to start in each notebook. The Vcores assigned is 4 and executor is 1. Can somebody please help me to know how can we boost the start of spark pool in azure synapse Analytics.

The performance of your Apache Spark pool jobs depends on multiple factors. These performance factors include:

  • How your data is stored
  • How the cluster has configured (Small, Medium, Large)
  • The operations that are used when processing the data.

Common challenges you might face include:

  • Memory constraints due to improperly sized executors.
  • Long-running operations
  • Tasks that result in cartesian operations.

There are also many optimizations that can help you overcome these challenges, such as caching and allowing for data skew.

The following article Optimize Apache Spark jobs (preview) in Azure Synapse Analytics describes common Spark job optimizations and recommendations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM