Spark 数据框未使用工作人员

Question

I have a spark cluster with 3 worker nodes, when i try to load the csv file from hdfs it only utilizes the resources(cpu & memory) on the system where i load the csv via spark-shell (used master node)我有一个带有 3 个工作节点的 spark 集群，当我尝试从 hdfs 加载 csv 文件时，它只使用我加载 Z628CB5675FF524F3E719B7AA2E88FE3F 的系统上的资源（cpu 和内存）（通过 spark 节点使用）-shell-shell

Load dataframe负载 dataframe

val df = spark.read.format("csv")
.option("header","true")
.load("hdfs://ipaddr:9000/user/smb_ram/2016_HDD.csv")

Do some operation on the dataframe对dataframe做一些操作

df.agg(sum("failure")).show

When i load csv system memory increases by 1.3 GB which is the hdfs file size & 100 % CPU usage.当我加载 csv 系统时，memory 增加 1.3 GB，即 hdfs 文件大小和 100% CPU 使用率。 The workers were idling CPU near 0 % and no memory usage changes.工作人员的 CPU 空闲率接近 0%，并且没有 memory 使用率变化。 Ideally i would expect all the heavy lifting to be done by worker which is not happening.理想情况下，我希望所有繁重的工作都由工人完成，而这并没有发生。

Answer 1

Set spark mode to cluster that should solve your problem.将火花模式设置为集群，这应该可以解决您的问题。 Looks like your job is running in Client mode.看起来您的作业正在客户端模式下运行。

Spark 数据框未使用工作人员

问题描述

1 个解决方案

解决方案1
1 2020-08-13 14:48:03

Spark 数据框未使用工作人员

问题描述

1 个解决方案

解决方案1 1 2020-08-13 14:48:03

解决方案1
1 2020-08-13 14:48:03