简体   繁体   English

Delta 表 / 雅典娜与火花

[英]Delta Table / Athena And Spark

I have my delta table, which can be read from Athena.我有我的 delta 表,可以从 Athena 读取。

在此处输入图像描述

When I try to get the data through a query from spark I get the following error:当我尝试通过 spark 的查询获取数据时,出现以下错误:

Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 80.0 failed 4 times, most recent failure: Lost task 0.3 in stage 80.0 (TID 449, ip-172-31-22-178.ec2.internal, executor 2): java.lang.RuntimeException: s3://<path>/BDA/DELTA/CLIENTE/_symlink_format_manifest/PERIODO=202001/manifest is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [117, 101, 116, 10]

在此处输入图像描述

在此处输入图像描述

if I do that same query in athena, there are no problems如果我在雅典娜做同样的查询,没有问题

在此处输入图像描述

This happens because your delta file was already created with a manifest to be read in athena now if you want to read it with spark, it has to be this way发生这种情况是因为你的增量文件已经创建了一个清单,如果你想用 spark 读取它,现在可以在 athena 中读取它,它必须是这样的

%sql select * from delta. %sql select * 来自增量。 s3://path/tabla/ limit s3://path/tabla/限制

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM