简体繁体中英

AWS Glue Job - pass glue catalog table names as parameters

原文 2020-06-02 21:26:42 2 1 amazon-web-services/ pyspark/ aws-lambda/ snowflake-cloud-data-platform/ aws-glue

I have a AWS Glue job in pyspark language which loads data from S3/Glue catalog db to snowflake. How can we achieve passing table names as parameters and run the aws glue job in parallel.

can we do it inside glue job or any lambda functions?

Please suggest and share any code/articles.

Thank you in advance.

Thanks, Jo

1 answers

AWS Glue lets you enter your own script, so it's very flexible. You can pass table names as parameters:

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api-crawler-pyspark-extensions-get-resolved-options.html

In this case, Glue job can process these tables sequentially:

parse the input parameters
create a loop for each table
- create_dynamic_frame for reading table
- transform if necessary
- write to snowflake
- process next table

If you want to run separate Glue jobs for each table to process them in parallel , then you need to pass only one table to the Glue job, and call the same job for multiple time with a different table name.

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-calling.html

Glue launches an EMR cluster based on the "Number of workers".

I do not know how many tables you will process, and the frequency of calling the Glue job, but it could be better to process the tables sequentialy with a bigger cluster to utilize resources.

AWS glue job (Pyspark) to AWS glue data catalog

AWS Glue - how to change column names in Glue Catalog table using BOTO3?

MSCK Repair Command on AWS Glue Catalog job

AWS Glue Job Input Parameters

AWS Glue CLI - Job Parameters

Creating a Glue Data Catalog Table within a Glue Job

AWS Glue Crawler sends all data to Glue Catalog and Athena without Glue Job

Build Table in Glue Catalog

How to get S3 key (path) of a table registered in AWS Glue Data Catalog from Spark job

AWS Glue Python Job not creating new Data Catalog partitions

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question AWS glue job (Pyspark) to AWS glue data catalog AWS Glue - how to change column names in Glue Catalog table using BOTO3? MSCK Repair Command on AWS Glue Catalog job AWS Glue Job Input Parameters AWS Glue CLI - Job Parameters Creating a Glue Data Catalog Table within a Glue Job AWS Glue Crawler sends all data to Glue Catalog and Athena without Glue Job Build Table in Glue Catalog How to get S3 key (path) of a table registered in AWS Glue Data Catalog from Spark job AWS Glue Python Job not creating new Data Catalog partitions

Related Tags

AWS Glue Job - pass glue catalog table names as parameters

Question

1 answers

solution1 2 2020-06-03 06:47:15

solution1
2 2020-06-03 06:47:15