简体繁体中英

Spark pool taking time to start in azure synapse Analytics

原文 2020-11-25 03:27:19 5 1 python/ azure/ apache-spark/ pyspark/ azure-synapse

I have created 3 different notebook using pyspark code in Azure synapse Analytics. Notebook is running using spark pool. There is only one spark pool for all 3 notebook. when these 3 notebook run individually, spark pool starts for all 3 notebook by default.

The issue which i am facing is related to spark pool. It is taking 10 minutes to start in each notebook. The Vcores assigned is 4 and executor is 1. Can somebody please help me to know how can we boost the start of spark pool in azure synapse Analytics.

1 answers

The performance of your Apache Spark pool jobs depends on multiple factors. These performance factors include:

How your data is stored
How the cluster has configured (Small, Medium, Large)
The operations that are used when processing the data.

Common challenges you might face include:

Memory constraints due to improperly sized executors.
Long-running operations
Tasks that result in cartesian operations.

There are also many optimizations that can help you overcome these challenges, such as caching and allowing for data skew.

The following article Optimize Apache Spark jobs (preview) in Azure Synapse Analytics describes common Spark job optimizations and recommendations.

How to read the spark_df as dataframe in synapse analytics (Microsoft azure)?

How to data transfer between external source and azure synapse analytics

Spark Udf taking time to run

GOOGLE Analytics API taking time for next request

Is there a Python SDK for Azure Synapse?

why multiprocessing pool taking more time when increasing the process?

Django tests taking a long time to start

Spark dataframe inside loop taking slower each time

Connection from databricks to azure synapse

One of the APIs on Azure function app is taking more time

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to read the spark_df as dataframe in synapse analytics (Microsoft azure)? How to data transfer between external source and azure synapse analytics Spark Udf taking time to run GOOGLE Analytics API taking time for next request Is there a Python SDK for Azure Synapse? why multiprocessing pool taking more time when increasing the process? Django tests taking a long time to start Spark dataframe inside loop taking slower each time Connection from databricks to azure synapse One of the APIs on Azure function app is taking more time

Related Tags

Spark pool taking time to start in azure synapse Analytics

Question

1 answers

solution1 -1 2020-11-27 06:47:13

solution1
-1 2020-11-27 06:47:13