aws胶水触发作业

Question

I have modified a Glue generated script that I use for transformation and manipulation with the data. 我已经修改了Glue生成的脚本，该脚本用于数据的转换和操作。 I want to run the same job by trigger on every new table that appears in the catalog but without manually changing the table name in the job script. 我想通过触发出现在目录中的每个新表上来运行同一作业，但不手动更改作业脚本中的表名。 In short, how can I run the same transformation the script provides on every new table that appears in the data catalog without manually changing the table name every time ? 简而言之，如何在脚本中对出现在数据目录中的每个新表上执行相同的转换，而不必每次都手动更改表名？

Thanks 谢谢

Answer 1

You can use Catalog Client to dynamically get the list of tables in the database. 您可以使用Catalog Client动态获取数据库中的表列表。 I don't know how to get the catalog client in pyspark, but in scala it looks like this 我不知道如何在pyspark中获取目录客户端，但是在scala中，它看起来像这样

val catalog = glueContext.getCatalogClient

for (table <- catalog.listTables("myDatabaseName", "").getTableList.asScala) {
    // do your transformation
}

aws胶水触发作业

问题描述

1 个解决方案

解决方案1
0 2018-05-17 19:36:46

aws胶水触发作业

问题描述

1 个解决方案

解决方案1 0 2018-05-17 19:36:46

解决方案1
0 2018-05-17 19:36:46