简体繁体中英

PySpark too slow in Google Cloud Dataproc

原文 2020-05-30 17:55:34 2 1 apache-spark/ google-cloud-platform/ google-cloud-dataproc

I deployed a PySpark ML model into a Google Cloud Dataproc cluster and it was running for over an hour, but my data is about 800 MB.

Is there anything being needed to declare as master on my SparkSession? I set the default option 'local'.

1 answers

When you pass a local deploy mode option to SparkContext it executes your application locally on a single VM, to avoid this you should not pass any options in the SparkContext constructor - it will use pre-configured properties by Dataproc and run your application on YARN utilizing all cluster resources/nodes.

Pyspark Job Failure on Google Cloud Dataproc

How to get PySpark working on Google Cloud Dataproc cluster

How to use jupyter, pyspark and cassandra together on google cloud dataproc cluster

pyspark failed in google dataproc

IllegalStateException on Google Cloud Dataproc

Google Cloud Dataproc configuration issues

Google cloud dataproc --files is not working

Install pyspark on Google cloud Dataproc cause “could not find valid SPARK_HOME while searching['/tmp', '/usr/local/bin']”

Pyspark GroupBy and count too slow

Google Cloud Dataproc migration to Spark 1.6.0

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Pyspark Job Failure on Google Cloud Dataproc How to get PySpark working on Google Cloud Dataproc cluster How to use jupyter, pyspark and cassandra together on google cloud dataproc cluster pyspark failed in google dataproc IllegalStateException on Google Cloud Dataproc Google Cloud Dataproc configuration issues Google cloud dataproc --files is not working Install pyspark on Google cloud Dataproc cause “could not find valid SPARK_HOME while searching['/tmp', '/usr/local/bin']” Pyspark GroupBy and count too slow Google Cloud Dataproc migration to Spark 1.6.0

Related Tags

PySpark too slow in Google Cloud Dataproc

Question

1 answers

solution1 1 2020-07-12 15:25:22

solution1
1 2020-07-12 15:25:22