[英]Parametrized/reusable AWS Glue job
我是AWS的新手,我正在嘗試創建一個參數化的AWS Glue作業,該作業應具有輸入參數:
有人做過類似的事情嗎?
首先,我不確定您是否可以通過大小限制數據。 相反,我建議按行數限制數據。 正如我在AWS Glue作業輸入參數中所述,您可以在作業中放入兩個第一變量。 當涉及到變量列表時,如果其中包含大量變量,我擔心您將無法使用標准方法提供這些輸入。 在這種情況下,我建議以與數據相同的方式提供這些變量,即使用平面文件。 例如:
var1;var2;var3
1;2;3
總結一下,我建議定義以下輸入變量:
這是代碼示例:
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME','SOURCE_DB','SOURCE_TAB','NUM_ROWS','DEST_FOLDER'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
df_new = glueContext.create_dynamic_frame.from_catalog(database = args['SOURCE_DB'], table_name = args['SOURCE_TAB'], transformation_ctx = "full_data")
df_0 = df_new.toDF()
df_0.createOrReplaceTempView("spark_dataframe")
choice_data = spark.sql("Select x,y,z from spark_dataframe")
choice_data = choice_data.limit(int(args['NUM_ROWS']))
choice_data.repartition(1).write.format('csv').mode('overwrite').options(delimiter=',',header=True).save("s3://"+ args['DEST_FOLDER'] +"/")
job.commit()
當然,您還必須在Glue作業配置中提供適當的輸入變量。
args = getResolvedOptions(sys.argv, ['JOB_NAME','source_db','source_table','count','dest_folder'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
df_new = glueContext.create_dynamic_frame.from_catalog(database = args['source_db'], table_name = args['source_table'], transformation_ctx = "sample_data")
df_0 = df_new.toDF()
df_0.registerTempTable("spark_dataframe")
new_data = spark.sql("Select * from spark_dataframe")
sample = new_data.limit(args['count'])
sample.repartition(1).write.format('csv').options(delimiter=',',header=True).save("s3://"+ args['dest_folder'] +"/")
job.commit()
I am getting error for line
sample = new_data.limit(args['count'])
error:
py4j.Py4JException: Method limit([class java.lang.String]) does not exist
but the argument passed is not a string.
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.