Using s3a on linux machine fail for >100 columns parquet

Question

I am using s3a to read from database into dataframe and write to.parquet(s3a://bucketname//folder). It works for <100 column dataframe but crashes.exits spark-shell for >~100 columns. cannot find any material if this is column limitation/version issue/memory issue? Hoping to find some direction from experienced community.

PS. Same code as below works on Eclipse Windows on my local machine, but issue on linux instance

spark version- 2.4.0-cdh6.3.3 scala version- 2.11.12 Java version- 1.8

def execute(sql:String) = {//defined connection }
val df_sql = ("select * from sampletable")
val df_exe = execute(df_sql)
df_exe.write.parquet(s3a://bucketname/folder)

Answer 1

found the answer, in case someone reaches this question. when calling spark-submit, increase the driver-memory to fit 1 partition of the file being written. I used 16g

Using s3a on linux machine fail for >100 columns parquet

Question

1 answers

solution1
1 ACCPTED 2021-05-31 02:22:02

Using s3a on linux machine fail for >100 columns parquet

Question

1 answers

solution1 1 ACCPTED 2021-05-31 02:22:02

solution1
1 ACCPTED 2021-05-31 02:22:02