Spark task fails to write rows into ORC table

Question

I run the following code for a spatial join on geometry fields:

 val coverage = DimCoverageReader.apply(spark, params)
    coverage.createOrReplaceTempView("dim_coverage")

    val uniqueGeometries = spark.table(params.UniqueGeometriesTable)
    uniqueGeometries.createOrReplaceTempView("unique_geometries")


    spark
      .sql(
        """select a.*, b.lac, b.cell_id
          |from unique_geometries as a, dim_coverage as b
          |where ST_Intersects(ST_GeomFromWKT(a.geo_wkt), ST_GeomFromWKT(b.geo_wkt))
          |""".stripMargin)

The resulting dataframe is later saved into ORC table:

Stage(spark,params).write
          .format("orc")
          .mode(SaveMode.Overwrite)
          .saveAsTable(params.IntersectGeometriesTable)

I get this error during execution: org.apache.spark.SparkException: Task failed while writing rows

    0/10/30 17:37:19 ERROR Executor: Exception in task 205.0 in stage 4.0 (TID 1219)
org.apache.spark.SparkException: Task failed while writing rows
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:270)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Column has wrong number of index entries found: 320 expected: 800
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:803)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1742)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:352)
    at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
    at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
    at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2413)
    at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:76)
    at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:55)
    at org.apache.spark.sql.hive.orc.OrcOutputWriter.write(OrcFileFormat.scala:248)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:325)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
    at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
    ... 8 more

What is the root cause of this problem?

Answer 1

If this works fine with format('parquet') my guess is that you have some sort of struct type or formatting issue. Can you add the printSchema for your DF?

Spark task fails to write rows into ORC table

Question

1 answers

solution1
0 2020-11-03 22:59:03

Spark task fails to write rows into ORC table

Question

1 answers

solution1 0 2020-11-03 22:59:03

solution1
0 2020-11-03 22:59:03