简体   繁体   English

组合 AWS Glue 作业中的字段

[英]Combining fields in AWS Glue jobs

My goal is to Extract record from a DDB table and transform them into an S3 JSON object.我的目标是从 DDB 表中提取记录并将它们转换为 S3 JSON object。

The record in the DDB table are: DDB表中的记录是:

accounts displayname name objectname objectid objectdefinition帐户显示名称名称对象名称对象ID 对象定义

The S3 Object required definition is: S3 Object 所需的定义是:

{
   "accounts": "example",
   "name": "exampleDisplay",
   "objectJson": "{
"displayname":"exampleDisplay",
"objectname":"exampleObjectName",
"objectId":"exampleObjectId",
"objectDefinition":"exampleDefintion",
}
}

Now the spark transform script to convert it into camelCase is easy.现在,将其转换为 camelCase 的 spark 转换脚本很容易。 But how do I create a new field like objectJson and add certain fields from ddb to it as a JSON.但是如何创建一个像 objectJson 这样的新字段并将 ddb 中的某些字段作为 JSON 添加到它。

You can do this in following way您可以通过以下方式执行此操作

df = spark.createDataFrame([('accounts','name','displayname','objectname','objectId','objectDefinition'),('accounts','name','displayname','objectname','objectId','objectDefinition')], ['accounts','name','displayname','objectname','objectId','objectDefinition'])

df2 = df.withColumn('objectJson',f.struct(
                            f.col('displayname'),
                            f.col('objectname'),
                            f.col('objectId'),
                            f.col('objectDefinition')
)).select('accounts','name','objectJson')

df2.toJSON().take(100)

[{"accounts":"accounts",
   "name":"name",
   "objectJson":{
                 "displayname":"displayname",
                 "objectname":"objectname",
                 "objectId":"objectId",
                  objectDefinition":"objectDefinition"
}}]

df2.repartition(1).write.json('path/to/save/data')

Hope it helps希望能帮助到你

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM