[英]AWS Glue python job limits the data amount to write in S3 bucket?
I've created a Glue job to read data from glue catalog and save it to an s3 bucket in parquet format.我创建了一个 Glue 作业来从胶水目录中读取数据并将其以镶木地板格式保存到 s3 存储桶中。 It works correctly, but the number of items is limited to 20. So every time the job is triggered, only 20 items gets saved in the bucket, and I would like to save all of them.它工作正常,但项目数量限制为 20。因此每次触发作业时,桶中只保存 20 个项目,我想保存所有项目。 Maybe I'm missing some additional property in the python script.也许我在 python 脚本中遗漏了一些额外的属性。
Here is the script (generated by AWS):这是脚本(由 AWS 生成):
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
transformation_ctx = "datasource0"]
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "cargoprobe_data", table_name = "dev_scv_completed_executions", transformation_ctx = "datasource0")
applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [*field list*], transformation_ctx = "applymapping1")
resolvechoice2 = ResolveChoice.apply(frame = applymapping1, choice = "make_struct", transformation_ctx = "resolvechoice2")
dropnullfields3 = DropNullFields.apply(frame = resolvechoice2, transformation_ctx = "dropnullfields3")
datasink4 = glueContext.write_dynamic_frame.from_options(frame = dropnullfields3, connection_type = "s3", connection_options = {"path": "s3://bucketname"}, format = "parquet", transformation_ctx = "datasink4")
job.commit()
This is done automatically in the background, it is called partitioning.这是在后台自动完成的,称为分区。 You can repartition by calling您可以通过调用重新分区
partitioned_df = dropnullfields3.repartition(1)
to repartition your DynamicFrame
to one file.将DynamicFrame
重新分区为一个文件。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.