简体   繁体   English

从 Azure VM SQL 服务器/数据库到 Azure Blob 存储(CSV/JSON)的近实时副本

[英]Near real time replica from Azure VM SQL Server/DB into Azure Blob Storage (CSV/JSON)

I am trying to achieve a near-real-time replica (within ~5 minutes ideally) of data between a source system (Azure VM with SQL Server - read only about 100 tables) into an Azure Storage Account (Gen 2, Blob folders) to support various upstream data workloads.我正在尝试在源系统(带有 SQL 服务器的 Azure VM - 仅读取大约 100 个表)之间实现数据的近实时副本(理想情况下在大约 5 分钟内)到 Azure 存储帐户(第 2 代,Blob 文件夹)支持各种上游数据工作负载。

I had considered Azure Data Factory to carry out an initial batch load of the historical data (takes ~40minutes using ADF), followed by an incremental "update" to the sink when source tables change (updates or inserts).我曾考虑使用 Azure 数据工厂来执行历史数据的初始批量加载(使用 ADF 大约需要 40 分钟),然后在源表更改(更新或插入)时对接收器进行增量“更新”。

The challenges are:挑战是:

  1. Some of the source tables are updated historically (eg a record from two years ago is added)一些源表是历史更新的(例如添加了两年前的记录)
  2. Some of the source tables are not transactional tables (they are lookup tables without timestamp columns, "LastUpdatedOn" for example does not exist in these tables).一些源表不是事务表(它们是没有时间戳列的查找表,例如“LastUpdatedOn”在这些表中不存在)。

What are the best possible approaches to establish this synchronization between sink and source?在接收器和源之间建立这种同步的最佳方法是什么?

You could start with Change Data Capture or Change Tracking , then run an SSIS job to write the data into blob storage.您可以从Change Data CaptureChange Tracking开始,然后运行 SSIS 作业将数据写入 blob 存储。 Or you could use something like Debezium .或者你可以使用像Debezium这样的东西。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM