[英]AWS Glue Pyspark Transformation Filter API not working
I am new to AWS Glue and Python.我是 AWS Glue 和 Python 的新手。 Trying to apply a Filer.apply function for a dynamicFrame datasource0 in filter3frame.
尝试为 filter3frame 中的 dynamicFrame datasource0 应用 Filer.apply function。 The job run failed and I am getting that filter_sex function is not defined in the logs.
作业运行失败,我发现日志中未定义 filter_sex function。 Exact error: "NameError: filter_sex is not defined".
确切的错误:“NameError:filter_sex 未定义”。 Can anyone tell what I am doing wrong?
谁能告诉我做错了什么?
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "amssurvey", table_name = "amssurvey", transformation_ctx = "datasource0")
filter1frame = Filter.apply(frame=datasource0, f=lambda x:x['citizen'] in ["US"])
filter2frame = Filter.apply(frame=datasource0, f=lambda x:x['count'] > 50)
filter3frame = Filter.apply(frame=datasource0, f=filter_sex(datasource0))
filter1_op = glueContext.write_dynamic_frame.from_options(frame = filter1frame, connection_type = "s3", connection_options = {"path": "s3://asgqatestautomation3/SourceFiles/filter1_op"}, format = "csv", transformation_ctx = "filter1_op")
filter2_op = glueContext.write_dynamic_frame.from_options(frame = filter2frame, connection_type = "s3", connection_options = {"path": "s3://asgqatestautomation3/SourceFiles/filter2_op"}, format = "csv", transformation_ctx = "filter2_op")
filter3_op = glueContext.write_dynamic_frame.from_options(frame = filter3frame, connection_type = "s3", connection_options = {"path": "s3://asgqatestautomation3/SourceFiles/filter3_op"}, format = "csv", transformation_ctx = "filter3_op")
job.commit()
def filter_sex(item):
if item['sex'] == 'Male':
return True
else:
return False
Instead of defining a func.而不是定义一个函数。 why dont you try below code
你为什么不试试下面的代码
filter3frame = Filter.apply(frame=datasource0, f=lambda x:x['sex'] > 'Male')
Regarding the compilation error: filter_sex
should be define before it is used关于编译错误:
filter_sex
应该在使用之前定义
I got it fixed.我把它修好了。
As told by @QuickSilver, every function has to be defined before it is used.正如@QuickSilver 所说,每个 function 必须在使用之前定义。 Also, the dynamic frame has to be written like below.
此外,动态框架必须如下所示。 filter_sex function where it is called need not be having a parameter.
调用它的 filter_sex function 不需要有参数。
filter3frame = Filter.apply(frame=datasource0, f=filter_sex)
So the final working code is as follows -所以最终的工作代码如下 -
import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job
def filter_sex(item):
if item['sex'] == 'Male':
return True
else:
return False
## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)
## @type: DataSource
## @args: [database = "amssurvey", table_name = "amssurvey", transformation_ctx = "datasource0"]
## @return: datasource0
## @inputs: []
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "amssurvey", table_name = "amssurvey", transformation_ctx = "datasource0")
## @type: ApplyMapping
## @args: [mapping = [("nomber", "long", "nomber", "long"), ("type", "string", "type", "string"), ("sex", "string", "sex", "string"), ("citizen", "string", "citizen", "string"), ("count", "long", "count", "long"), ("countstate", "long", "countstate", "long")], transformation_ctx = "applymapping1"]
## @return: applymapping1
## @inputs: [frame = datasource0]
filter1frame = Filter.apply(frame=datasource0, f=lambda x:x['citizen'] in ["US"])
filter2frame = Filter.apply(frame=datasource0, f=lambda x:x['count'] > 50)
filter3frame = Filter.apply(frame=datasource0, f=filter_sex)
filter1_op = glueContext.write_dynamic_frame.from_options(frame = filter1frame, connection_type = "s3", connection_options = {"path": "s3://asgqatestautomation3/SourceFiles/filter1_op"}, format = "csv", transformation_ctx = "filter1_op")
filter2_op = glueContext.write_dynamic_frame.from_options(frame = filter2frame, connection_type = "s3", connection_options = {"path": "s3://asgqatestautomation3/SourceFiles/filter2_op"}, format = "csv", transformation_ctx = "filter2_op")
filter3_op = glueContext.write_dynamic_frame.from_options(frame = filter3frame, connection_type = "s3", connection_options = {"path": "s3://asgqatestautomation3/SourceFiles/filter3_op"}, format = "csv", transformation_ctx = "filter3_op")
job.commit()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.