简体   繁体   English

PySpark session 中的动态分配问题(在 MLRun 和 K8s 中)

[英]Issue with dynamic allocation in PySpark session (under MLRun and in K8s)

I would like to maximize power of Spark cluster in MLRun solution for my calculation and I used this session setting for Spark cluster in MLRun solution (it is under Kube.netes cluster):我想在我的计算中最大化 MLRun 解决方案中 Spark 集群的能力,我在 MLRun 解决方案中为 Spark 集群使用了这个 session 设置(它在 Kube.netes 集群下):

spark = SparkSession.builder.appName('Test-Spark') \
    .config("spark.dynamicAllocation.enabled", True) \
    .config("spark.shuffle.service.enabled", True) \
    .config("spark.executor.memory", "12g") \
    .config("spark.executor.cores", "4") \
    .config("spark.dynamicAllocation.enabled", True) \
    .config("spark.dynamicAllocation.minExecutors", 3) \
    .config("spark.dynamicAllocation.maxExecutors", 6) \
    .config("spark.dynamicAllocation.initialExecutors", 5)\
    .getOrCreate()

The issue is, that I cannot utilize all power and in many cases I utilized only 1, 2 or 3 executor with small amount of cores.问题是,我无法利用所有功能,在许多情况下,我只使用了 1、2 或 3 个具有少量内核的执行器。

Do you know, how to utilize in Spark session higher sources/performance (it seems, that dynamic allocation does not work correctly in MLRun & K8s & Spark)?你知道吗,如何在 Spark session 中利用更高的资源/性能(看起来,动态分配在 MLRun & K8s & Spark 中不能正常工作)?

I can fully utilize Spar cluster (environment MLRun & K8s & Spark) in case of static parameters in Spark session (Spark session with params 'dynamicAllocation' did not work for me).如果 Spark session 中的 static 参数(带有参数“dynamicAllocation”的 Spark session 对我不起作用),我可以充分利用 Spar 集群(环境 MLRun & K8s & Spark)。 You can see a few function samples (note: infrastructure in K8s must be higher eg 3 executors and 12 cores as total):您可以看到一些 function 样本(注意:K8s 中的基础设施必须更高,例如总共有 3 个执行程序和 12 个内核):

Configuration 3x executors, total 9 cores:配置 3x 执行器,共 9 个核心:

spark = SparkSession.builder.appName('Test-Spark') \
    .config("spark.executor.memory", "9g") \
    .config("spark.executor.cores", "3") \
    .config('spark.cores.max', 9) \
    .getOrCreate()

在此处输入图像描述

Configuration 2x executors, total 8 cores:配置 2x 执行器,总共 8 个核心:

spark = SparkSession.builder.appName('Test-Spark') \
    .config("spark.executor.memory", "9g") \
    .config("spark.executor.cores", "4") \
    .config('spark.cores.max', 8) \
    .getOrCreate()

在此处输入图像描述

or it is possible to use Spark operator, detail see link或者可以使用 Spark 运算符,详细信息请参见链接

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM