简体   繁体   English

AWS Glue ETL 到 Redshift:日期

[英]AWS Glue ETL to Redshift: DATE

I am using AWS Glue to ETL data to Redshift.我正在使用 AWS Glue 将 ETL 数据传输到 Redshift。 I have been encountering an issue where my date is loading as null in Redshift.我遇到了一个问题,我的日期在 Redshift 中加载为 null。

What I have set-up:我设置了什么:

  • Upload csv into S3, see sample data:将 csv 上传到 S3,参见示例数据:

item |项目 | color |颜色 | price |价格 | date日期

shirt|衬衫| brown |棕色 | 25.05 | 25.05 | 03-01-2018 03-01-2018

pants|裤子| black |黑色 | 20.99 | 20.99 | 02-14-2017 02-14-2017

  • Crawl S3 object爬行S3 object

  • Create a Redshift table, see schema:创建一个 Redshift 表,参见架构:

    item: string color: string price: decimal / numeric date: date项目:字符串颜色:字符串价格:十进制/数字日期:日期

  • Script to load data to Redshift, see script:将数据加载到 Redshift 的脚本,请参阅脚本:


    import sys
    from awsglue.transforms import *
    from awsglue.utils import getResolvedOptions
    from pyspark.context import SparkContext
    from awsglue.context import GlueContext
    from awsglue.job import Job
    from pyspark.sql.functions import to_date, col
    from awsglue.dynamicframe import DynamicFrame
    
    glueContext = GlueContext(SparkContext.getOrCreate())
    
    items_dynamicframe = glueContext.create_dynamic_frame.from_catalog(
           database = "rdshft-test",
           table_name = "items")
    items_dynamicframe.printSchema()
    
    #Attempt to get date loaded correctly to Redshift
    data_frame = items_dynamicframe.toDF()
    data_frame.show()
    data_frame = data_frame.withColumn("date",
              to_date(col("date"),"d-M-Y"))
    data_frame.show()

Any feedback is appreciated.任何反馈表示赞赏。 Thank you.谢谢你。

I was able to resolve this issue by converting back to dynamic frame.我能够通过转换回动态框架来解决这个问题。 When porting my data into notebook, I am using a dynamicframe.将数据移植到笔记本时,我使用的是动态框架。 But, to convert string to date, I must use dataframe (more specifically pyspark sql functions).但是,要将字符串转换为日期,我必须使用 dataframe(更具体地说是 pyspark sql 函数)。 To load into Redshift, I must convert back to dynamicframe.要加载到 Redshift,我必须转换回动态帧。 Assuming this is a requirement with Glue?假设这是 Glue 的要求?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM