简体   繁体   English

AWS Glue ETL - 将纪元转换为时间戳

[英]AWS Glue ETL - Converting Epoch to timestamp

as the title states, I'm having trouble converting a column on a Dynamic Frame from Epoch to a timestamp.如标题所述,我无法将动态框架上的列从 Epoch 转换为时间戳。

I have tried onverting in into a Data Frame and back to Dynamic Frame but it is not working.我试过转换为数据帧并返回动态帧,但它不起作用。

This is what I'm working with:这就是我正在使用的:

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame

from pyspark.sql import functions as f
from pyspark.sql import types as t
from pyspark.sql.functions import udf

from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

datasource0 = glueContext.create_dynamic_frame.from_catalog(database = "s3-sat-dth-prd", table_name = "s3_sat_dth_prd_vehicle", transformation_ctx = "datasource0")

applymapping1 = ApplyMapping.apply(frame = datasource0, mappings = [("in", "int", "in", "int"), ("out", "int", "out", "int"), ("ts", "long", "ts", "long"), ("cam", "string", "cam", "string"), ("subclass", "string", "subclass", "string")], transformation_ctx = "applymapping1")   

selectfields2 = SelectFields.apply(frame = applymapping1, paths = ["in", "out", "ts", "cam", "subclass"], transformation_ctx = "selectfields2")

resolvechoice3 = ResolveChoice.apply(frame = selectfields2, choice = "MATCH_CATALOG", database = "s3-sat-dth-prd", table_name = "test_split_array_into_records_json", transformation_ctx = "resolvechoice3")

datasink4 = glueContext.write_dynamic_frame.from_catalog(frame = resolvechoice3, database = "s3-sat-dth-prd", table_name = "test_split_array_into_records_json", transformation_ctx = "datasink4")
job.commit()

What I've tried was creating a Data Frame tsconvert = resolvechoice3.toDF() and turn it back into Dynamic Frame resolvechoice4 = DynamicFrame.fromDF(tsconvert, GlueContext, resolvechoice4) ;我试过的是创建一个数据帧tsconvert = resolvechoice3.toDF()并将其转回动态帧resolvechoice4 = DynamicFrame.fromDF(tsconvert, GlueContext, resolvechoice4) ; I get a syntax error at the last code snippet i pasted right at the end of resolvechoice4 .我在resolvechoice4末尾粘贴的最后一个代码片段出现语法错误。

COuld not find if there is anything built into Glue to convert to timestamp.无法找到 Glue 中是否内置了任何可转换为时间戳的内容。 When Iìll make sure the data is correctly written to S3, Redshift will be my Target.当我确保数据正确写入 S3 时,Redshift 将成为我的目标。

Has anybody ever done anything like this and could lead me the way?有没有人做过这样的事情并且可以引导我前进?

Thanks in advance.提前致谢。

AWS Glue has the SQL functions( imported via pyspark package) which allows to transform the epoch timestamps into human readable or desired date format. AWS Glue 具有 SQL 函数(通过 pyspark 包导入),可以将纪元时间戳转换为人类可读或所需的日期格式。

Example:例子:

from pyspark.sql.functions import from_unixtime, unix_timestamp, col

resolvechoice3 = ResolveChoice.apply(frame = selectfields2, choice = "MATCH_CATALOG", database = "s3-sat-dth-prd", table_name = "test_split_array_into_records_json", transformation_ctx = "resolvechoice3")

tsconvert = resolvechoice3.toDF()
tsconverted= tsconvert.withColumn(col(tsColumnName),from_unixtime(col(tsColumnName)))
resolvechoice4 = DynamicFrame.fromDF(tsconverted, glue_context,"transformedDF")

Based on your need, you can define the date format in similar way using date functions from pyspark.sql.functions class.根据您的需要,您可以使用pyspark.sql.functions class 中的日期函数以类似方式定义日期格式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM