简体   繁体   中英

AWS Glue. How to create a compound key for Job bookmarks?

I have a JDBC source (PostgreSQL) with a table, which I want to fetch by Glue.

My table has columns:

id          (bigint)
name        (string)
updated_at  (timestamp)

I've set up the table in the Glue data catalog with a crawler, set up a job and enabled Job bookmarks.

And when I run the job, it automatically defines new rows by new ids.

But I want to use the compound key -> [ id + updated_at ].

It will alow me to detect all updates in the source table.

How can I do it?

AWS docs say that this feature is available ( https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html ):

For JDBC sources, the following rules apply:
   * For each table, AWS Glue uses one or more columns as bookmark keys to determine new and processed data. The bookmark keys combine to form a single compound key.
   * You can specify the columns to use as bookmark keys. If you don't specify bookmark keys, AWS Glue by default uses the primary key as the bookmark key, provided that it is sequentially increasing or decreasing (with no gaps).

Should I define the table manually (without crawlers)?

Thanks !

datasource0 = glueContext.create_dynamic_frame.from_catalog(
    database = "hr", table_name = "emp",
    transformation_ctx = "datasource0",
    additional_options = {
        "jobBookmarkKeys": ["empno"],
        "jobBookmarkKeysSortOrder": "asc"
    }
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM