[英]Combining fields in AWS Glue jobs
My goal is to Extract record from a DDB table and transform them into an S3 JSON object.我的目标是从 DDB 表中提取记录并将它们转换为 S3 JSON object。
The record in the DDB table are: DDB表中的记录是:
accounts displayname name objectname objectid objectdefinition帐户显示名称名称对象名称对象ID 对象定义
The S3 Object required definition is: S3 Object 所需的定义是:
{
"accounts": "example",
"name": "exampleDisplay",
"objectJson": "{
"displayname":"exampleDisplay",
"objectname":"exampleObjectName",
"objectId":"exampleObjectId",
"objectDefinition":"exampleDefintion",
}
}
Now the spark transform script to convert it into camelCase is easy.现在,将其转换为 camelCase 的 spark 转换脚本很容易。 But how do I create a new field like objectJson and add certain fields from ddb to it as a JSON.
但是如何创建一个像 objectJson 这样的新字段并将 ddb 中的某些字段作为 JSON 添加到它。
You can do this in following way您可以通过以下方式执行此操作
df = spark.createDataFrame([('accounts','name','displayname','objectname','objectId','objectDefinition'),('accounts','name','displayname','objectname','objectId','objectDefinition')], ['accounts','name','displayname','objectname','objectId','objectDefinition'])
df2 = df.withColumn('objectJson',f.struct(
f.col('displayname'),
f.col('objectname'),
f.col('objectId'),
f.col('objectDefinition')
)).select('accounts','name','objectJson')
df2.toJSON().take(100)
[{"accounts":"accounts",
"name":"name",
"objectJson":{
"displayname":"displayname",
"objectname":"objectname",
"objectId":"objectId",
objectDefinition":"objectDefinition"
}}]
df2.repartition(1).write.json('path/to/save/data')
Hope it helps希望能帮助到你
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.