在 linux 机器上使用 s3a 失败 > 100 列镶木地板

Question

I am using s3a to read from database into dataframe and write to.parquet(s3a://bucketname//folder).我正在使用 s3a 从数据库读取 dataframe 并写入.parquet(s3a://bucketname//folder)。 It works for <100 column dataframe but crashes.exits spark-shell for >~100 columns.它适用于 <100 列 dataframe 但 crash.exits spark-shell 用于 >~100 列。 cannot find any material if this is column limitation/version issue/memory issue?如果这是列限制/版本问题/内存问题，找不到任何材料？ Hoping to find some direction from experienced community.希望从有经验的社区中找到一些方向。

PS. PS。 Same code as below works on Eclipse Windows on my local machine, but issue on linux instance与以下相同的代码适用于 Eclipse Windows 在我的本地机器上，但在 linux 实例上出现问题

spark version- 2.4.0-cdh6.3.3 scala version- 2.11.12 Java version- 1.8火花版本- 2.4.0-cdh6.3.3 scala 版本- 2.11.12 Java 版本- 1.8

def execute(sql:String) = {//defined connection }
val df_sql = ("select * from sampletable")
val df_exe = execute(df_sql)
df_exe.write.parquet(s3a://bucketname/folder)

Answer 1

found the answer, in case someone reaches this question.找到答案，以防有人遇到这个问题。 when calling spark-submit, increase the driver-memory to fit 1 partition of the file being written.调用 spark-submit 时，增加驱动程序内存以适应正在写入的文件的 1 个分区。 I used 16g我用了16g

在 linux 机器上使用 s3a 失败 > 100 列镶木地板

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-05-31 02:22:02

在 linux 机器上使用 s3a 失败 &gt; 100 列镶木地板

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-05-31 02:22:02

在 linux 机器上使用 s3a 失败 > 100 列镶木地板

解决方案1
1 已采纳 2021-05-31 02:22:02