简体   繁体   中英

AWS DMS Ongoing Replication Falling Behind?

We are using AWS DMS for on-going replication of specific tables from one Oracle RDS database instance to another Oracle RDS database (both 11g).

Intermittently, the replication seems to fall behind or get out of sync. There are no errors in the log and everything is reported as successful, but data is missing.

We can kick off a full refresh and the data will show up, but this isn't a viable option on a regular basis. This is a production system and a full refresh takes upwards of 14 hours

We would like to monitor whether the destination database is [at least mostly] up to date. Meaning, no more than 2-3 hours behind.

I've found that you can get the current SCN from the source database using "SELECT current_scn FROM V$DATABASE" and from the target in the "awsdms_txn_state" table.

However, that table doesn't exist and I don't see any option to enable TaskRecoveryTableEnabled when creating or modifying a task.

Is there an existing feature that will automatically monitor these values? Can it be done through Lambda?

If DMS is reporting success, then we have no way of knowing that our data is hours or days behind until someone calls us complaining.

I do see an option in the DMS task to "Enable validation", but intuition tells me that's going to create a significant amount of unwanted overhead.

Thanks in advance.

There are a few questions here:

  1. Task Monitoring of CDC Latency
  2. How to set TaskRecoveryTableEnabled

For the first, task Monitoring provides a number of CloudWatch metrics (see all CDC* metrics).

It is possible to see on these metrics when the target is out of sync with the source, and where in the replication instance's process these changes are. The detailed blog from AWS explaining these Task Monitoring metrics is worth reading.

One option is to put a CloudWatch Alarm on the CDCLatencySource.

Alternatively you can create your own Lambda on a CloudWatch schedule to run your SCN queries on source and target and output a custom CloudWatch Metric using PutMetricData. You can create a CloudWatch Alarm on this metric if they are out of sync.

For the second question, to set the TaskRecoveryTableEnabled via the console tick the option "Create recovery table on target DB"

在目标数据库上创建恢复表

After ticking this you can confirm that the TaskRecoveryTableEnabled is set to Yes by looking at the Overview tab of the task. At the bottom there is the Task Settings json which will have something like:

    "TargetMetadata": {
        "TargetSchema": "",
        "SupportLobs": true,
        "FullLobMode": false,
        "LobChunkSize": 0,
        "LimitedSizeLobMode": true,
        "LobMaxSize": 32,
        "InlineLobMaxSize": 0,
        "LoadMaxFileSize": 0,
        "ParallelLoadThreads": 0,
        "ParallelLoadBufferSize": 0,
        "BatchApplyEnabled": false,
        "TaskRecoveryTableEnabled": true
  }

控制台中的任务设置

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM