[英]How to continually migrate data from on-premises SQL Db to Azure SQL Db
As a part of Azure Machine Learning process, I need to continually
migrate data from on-premises SQL Db to Azure SQL Db using Data Management Gateway
.作为Azure 机器学习过程的一部分,我需要使用
Data Management Gateway
continually
将数据从本地 SQL Db 迁移到 Azure SQL Db。
This Azure official article describes how to: Move data from an on-premises SQL server to SQL Azure with Azure Data Factory .这篇 Azure 官方文章介绍了如何: 使用 Azure 数据工厂将数据从本地 SQL 服务器移动到 SQL Azure 。 But the details are a bit confusing to me.
但是细节让我有点困惑。 If someone to briefly describe the process, how would you do that.
如果有人来简单描述一下这个过程,你会怎么做。 What are 2-3
main
steps one needs to perform on on-premises
and 2-3 steps on Azure Cloud
?需要
on-premises
执行的 2-3 个main
步骤和在Azure Cloud
上执行的 2-3 个步骤是什么? No details are needed.不需要细节。 Note : The solution has to involve using
Data Management Gateway
注意:解决方案必须涉及使用
Data Management Gateway
Based on Azure documentation you can use "slices".根据 Azure 文档,您可以使用“切片”。 You can perform a "delta" fetch using a timestamp column as mentioned by this article or using a sequential integer column.
您可以执行“增量”取使用时间戳列如提及本文章或使用顺序整数列。 To avoid issues about rows not been included on the a slice due to the on-premise server having system date a little behind than Azure system date, is better to use a sequential integer.
为避免由于本地服务器的系统日期比 Azure 系统日期晚一些而导致切片中未包含行的问题,最好使用顺序整数。 Below the Input dataset shows how to define slices:
输入数据集下方显示了如何定义切片:
{
"name": "AzureBlobInput",
"properties": {
"type": "AzureBlob",
"linkedServiceName": "StorageLinkedService",
"typeProperties": {
"folderPath": "mycontainer/myfolder/{Year}/{Month}/{Day}/",
"partitionedBy": [
{ "name": "Year", "value": {"type": "DateTime","date": "SliceStart","format": "yyyy"}},
{ "name": "Month","value": {"type": "DateTime","date": "SliceStart","format": "MM"}},
{ "name": "Day","value": {"type": "DateTime","date": "SliceStart","format": "dd"}}
],
"format": {
"type": "TextFormat"
}
},
"external": true,
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
You can create an activity and use the availability section to specify a schedule for the activity.您可以创建活动并使用可用性部分来指定活动的时间表。 You can specify "frequency" (minute, hour, day, etc.) and "interval".
您可以指定“频率”(分钟、小时、天等)和“间隔”。
"scheduler": {
"frequency": "Hour",
"interval": 1
}
Each unit of data consumed or produced by an activity run is called a data slice.活动运行消耗或产生的每个数据单元称为数据切片。 The following diagram shows an example of an activity with one input dataset and one output dataset:
下图显示了具有一个输入数据集和一个输出数据集的活动示例:
The diagram shows the hourly data slices for the input and output dataset.该图显示了输入和输出数据集的每小时数据切片。 The diagram shows three input slices that are ready for processing.
该图显示了三个准备好进行处理的输入切片。 The 10-11 AM activity is in progress, producing the 10-11 AM output slice.
上午 10-11 点活动正在进行中,生成上午 10-11 点输出片段。
You can access the time interval associated with the current slice in the dataset JSON by using variables: SliceStart and SliceEnd.您可以使用变量:SliceStart 和 SliceEnd,访问与数据集 JSON 中的当前切片关联的时间间隔。 You can use these variables in your activity JSON to select data from input dataset representing time series data (for example: 8 AM to 9 AM).
您可以在活动 JSON 中使用这些变量从表示时间序列数据的输入数据集中选择数据(例如:上午 8 点到上午 9 点)。
You can also set the start date for the pipeline in the past as shown here .您还可以设置的开始日期在过去的管道如图所示这里。 When you do so, Data Factory automatically calculates (back fills) all data slices in the past and begins processing them.
当您这样做时,数据工厂会自动计算(回填)过去的所有数据切片并开始处理它们。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.