AWS Glue 作业以 Parquet 格式错误写入 s3，但未找到

Question

I've been creting pyspark jobs and I keep getting one similar and intermittently error (is more like random):我一直在创建 pyspark 工作，并且不断收到类似的间歇性错误（更像是随机的）：

An error occurred while calling o129.parquet. Not Found 
(Service: Amazon S3; Status Code: 404; Error Code: 404 Not Found; 
Request ID: D2FA355F92AF8F05; S3 Extended Request ID: 1/fWdf1DurwPDP40HDGARlMRO/7lKzFDJ4g7DbUnM04wUvG89CG9w5T+u4UxapkWp20MfQfdjsE=)

I'm not even reading from s3, what I'm actually doing is: df.coalesce(100).write.partitionBy("mth").mode("overwrite").parquet("s3://"+bucket+"/"+path+"/out")我什至没有从 s3 读取，我实际上在做的是： df.coalesce(100).write.partitionBy("mth").mode("overwrite").parquet("s3://"+bucket+"/"+path+"/out")

So I change the coalesce partition, but I'm not wure what else should I do to mitigate this error and make my jobs more stable.所以我更改了coalesce分区，但我不知道我还应该做些什么来减轻这个错误并使我的工作更稳定。

Answer 1

for reading the file from s3 using glue用于使用胶水从 s3 读取文件

datasource0 = glueContext.create_dynamic_frame.from_options( connection_type = "s3", connection_options = {"paths": "s3/path"}, format = "json", transformation_ctx = "datasource0")

for writing file to s3 using glue用于使用胶水将文件写入 s3

output = glueContext.write_dynamic_frame.from_options(frame = df, connection_type = "s3", connection_options = {"path": "s3/path"}, format = "parquet", transformation_ctx = "output")

AWS Glue 作业以 Parquet 格式错误写入 s3，但未找到

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-02-13 19:09:22

AWS Glue 作业以 Parquet 格式错误写入 s3，但未找到

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-02-13 19:09:22

解决方案1
0 已采纳 2020-02-13 19:09:22