简体   繁体   English

ETL:在AWS胶粘作业中展平嵌套数组

[英]ETL : Flatten a nested array in an AWS glue job

I am currently trying to import data stored in json using AWS Glue. 我目前正在尝试使用AWS Glue导入存储在json中的数据。 The jsons contains an attribute 'tags' defined as an array of string. jsons包含定义为字符串数组的属性“标签”。 I have already imported the table without the tags at first place. 我已经首先导入了没有标签的表格。 I would like to be able to import the tag's attribute into another table in order to have a clean one-to-many relationship. 我希望能够将标签的属性导入到另一个表中,以保持干净的一对多关系。 After looking in the documentation, I can't see how to do that using the awsglue framework. 在查看文档之后,我看不到如何使用awsglue框架执行此操作。 Any ideas? 有任何想法吗?

Hugo 雨果

Use relationize method in glue. 在胶水中使用关联方法。

https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html

There is example on github AWS too. github AWS上也有示例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM