[英]Azure storage incremental copy by modified date
I need to copy one storage account into another. 我需要将一个存储帐户复制到另一个帐户。 I have created a
Runbook
and schedule it to run daily. 我创建了一个
Runbook
并安排它每天运行。 This is an incremental copy. 这是增量副本。
What I am doing is 我在做什么
Start-AzureStorageBlobCopy
Start-AzureStorageBlobCopy
While this works for containers with small size, this takes a very long time and is certainly cost ineffective for containers with say 10 million block blobs because every time I run the task I have to go through all those 10 million blobs. 虽然这适用于体积小的容器,但这需要很长时间,对于容量为1000万块的容器来说肯定是成本效率低的,因为每次运行任务时我都必须经历所有这1000万个blob。
I don't see it in documentation but is there any way i can use conditional headers like DateModifedSince
some thing like Get-AzureStorageBlob -DateModifiedSince date
in powershell. 我没有在文档中看到它但是有什么方法可以使用像
DateModifedSince
这样的条件标题,例如Get-AzureStorageBlob -DateModifiedSince date
中的Get-AzureStorageBlob -DateModifiedSince date
。
I have not tried but I can see it is possible to use DateModifiedSince
in nodejs library 我没试过,但我可以看到可以在nodejs库中使用
DateModifiedSince
Is there anyway I can do it with powershell so that I can be able to use Runbooks
? 无论如何,我可以用powershell做到这一点,以便我能够使用
Runbooks
?
EDIT: 编辑:
Using AzCopy made a copy of storage account that contains 7 million blobs, I uploaded few new blobs and started the azcopy again. 使用AzCopy制作了一个包含700万个blob的存储帐户副本,我上传了一些新blob并再次启动了azcopy。 It still takes significant amount of time to copy few new uploaded files.
复制一些新上传的文件仍需要大量时间。
AzCopy /Source:$sourceUri /Dest:$destUri /SourceKey:$sourceStorageKey /DestKey:$destStorageAccountKey /S /XO /XN /Y
It is possible to filter for a blob with blob name in no time 可以立即过滤具有blob名称的blob
For example Get-AzureStorageBlob -Blob
will return the blob immediately from 7 million records 例如,
Get-AzureStorageBlob -Blob
将立即从700万条记录中返回blob
It should have been possible to filter blob(s) with other properties too.. 应该可以使用其他属性过滤blob。
I am not sure if this would be the actual correct answer but I have resorted to this solution for now. 我不确定这是否是真正正确的答案,但我现在已经采用了这个解决方案。
AzCopy is a bit faster but since it's executable I have no option to use it in Automation. AzCopy有点快,但由于它是可执行的,我没有选择在自动化中使用它。
I wrote my own runbook (can be modified as workflow) which implements following AzCopy command 我编写了自己的Runbook(可以修改为工作流),它实现了以下AzCopy命令
AzCopy /Source:$sourceUri /Dest:$destUri /SourceKey:$sourceStorageKey /DestKey:$destStorageAccountKey /S /XO /Y
A code snippet 代码段
#loop throught the source container blobs,
# and copy the blob to destination that are not already there
$MaxReturn = 20000
$Total = 0
$Token = $null
$FilesTransferred = 0;
$FilesTransferSuccess = 0;
$FilesTransferFail = 0;
$sw = [Diagnostics.Stopwatch]::StartNew();
DO
{
$SrcBlobs = Get-AzureStorageBlob -Context $sourceContext -Container $container -MaxCount $MaxReturn -ContinuationToken $Token |
Select-Object -Property Name, LastModified, ContinuationToken
$DestBlobsHash = @{}
Get-AzureStorageBlob -Context $destContext -Container $container -MaxCount $MaxReturn -ContinuationToken $Token |
Select-Object -Property Name, LastModified, ContinuationToken |
ForEach { $DestBlobsHash[$_.Name] = $_.LastModified.UtcDateTime }
$Total += $SrcBlobs.Count
if($SrcBlobs.Length -le 0) {
Break;
}
$Token = $SrcBlobs[$SrcBlobs.Count -1].ContinuationToken;
ForEach ($SrcBlob in $SrcBlobs){
# search in destination blobs for the source blob and unmodified, if found copy it
$CopyThisBlob = $false
if(!$DestBlobsHash.count -ne 0){
$CopyThisBlob = $true
} elseif(!$DestBlobsHash.ContainsKey($SrcBlob.Name)){
$CopyThisBlob = $true
} elseif($SrcBlob.LastModified.UtcDateTime -gt $DestBlobsHash.Item($SrcBlob.Name)){
$CopyThisBlob = $true
}
if($CopyThisBlob){
#Start copying the blobs to container
$blobToCopy = $SrcBlob.Name
"Copying blob: $blobToCopy to destination"
$FilesTransferred++
try {
$c = Start-AzureStorageBlobCopy -SrcContainer $container -SrcBlob $blobToCopy -DestContainer $container -DestBlob $blobToCopy -SrcContext $sourceContext -DestContext $destContext -Force
$FilesTransferSuccess++
} catch {
Write-Error "$blobToCopy transfer failed"
$FilesTransferFail++
}
}
}
}
While ($Token -ne $Null)
$sw.Stop()
"Total blobs in container $container : $Total"
"Total files transferred: $FilesTransferred"
"Transfer successfully: $FilesTransferSuccess"
"Transfer failed: $FilesTransferFail"
"Elapsed time: $($sw.Elapsed) `n"
Last modified is stored in the iCloudBlob object, you can access it with Powershell, like this 最后修改后存储在iCloudBlob对象中,您可以使用Powershell访问它,就像这样
$blob = Get-AzureStorageBlob -Context $Context -Container $container
$blob[1].ICloudBlob.Properties.LastModified
Which will give you 哪个会给你
DateTime : 31/03/2016 17:03:07
日期时间:31/03/2016 17:03:07
UtcDateTime : 31/03/2016 17:03:07UtcDateTime:31/03/2016 17:03:07
LocalDateTime : 31/03/2016 18:03:07LocalDateTime:31/03/2016 18:03:07
Date : 31/03/2016 00:00:00日期:31/03/2016 00:00:00
Day : 31第31天
DayOfWeek : ThursdayDayOfWeek:周四
DayOfYear : 91DayOfYear:91
Hour : 17小时:17
Millisecond : 0毫秒:0
Minute : 3分钟:3
Month : 3月:3
Offset : 00:00:00偏移:00:00:00
Second : 7第二:7
Ticks : 635950405870000000蜱虫:635950405870000000
UtcTicks : 635950405870000000UtcTicks:635950405870000000
TimeOfDay : 17:03:07TimeOfDay:17:03:07
Year : 2016年份:2016年
Having a read through the API I don't think it is possible to perform a search on the container with any parameters other than name. 通过API读取我认为不可能使用除name之外的任何参数对容器执行搜索。 I can only imagine that the nodejs library still retrieves all blobs and then filters them.
我只能想象nodejs库仍然检索所有blob然后过滤它们。
I will dig into it a little bit more though 我会更深入地了解它
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.