Delta 表 / 雅典娜与火花

Question

I have my delta table, which can be read from Athena.我有我的 delta 表，可以从 Athena 读取。

When I try to get the data through a query from spark I get the following error:当我尝试通过 spark 的查询获取数据时，出现以下错误：

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 80.0 failed 4 times, most recent failure: Lost task 0.3 in stage 80.0 (TID 449, ip-172-31-22-178.ec2.internal, executor 2): java.lang.RuntimeException: s3://<path>/BDA/DELTA/CLIENTE/_symlink_format_manifest/PERIODO=202001/manifest is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]

if I do that same query in athena, there are no problems如果我在雅典娜做同样的查询，没有问题

Answer 1

This happens because your delta file was already created with a manifest to be read in athena now if you want to read it with spark, it has to be this way发生这种情况是因为你的增量文件已经创建了一个清单，如果你想用 spark 读取它，现在可以在 athena 中读取它，它必须是这样的

%sql select * from delta. %sql select * 来自增量。 s3://path/tabla/ limit s3://path/tabla/限制

Delta 表 / 雅典娜与火花

问题描述

1 个解决方案

解决方案1
0 2022-05-04 19:49:00

Delta 表 / 雅典娜与火花

问题描述

1 个解决方案

解决方案1 0 2022-05-04 19:49:00

解决方案1
0 2022-05-04 19:49:00