简体   繁体   English

按修改日期的Azure存储增量副本

[英]Azure storage incremental copy by modified date

I need to copy one storage account into another. 我需要将一个存储帐户复制到另一个帐户。 I have created a Runbook and schedule it to run daily. 我创建了一个Runbook并安排它每天运行。 This is an incremental copy. 这是增量副本。

What I am doing is 我在做什么

  1. list the blobs in source storage container 列出源存储容器中的blob
  2. Check the blobs in destination storage container 检查目标存储容器中的blob
  3. If it doesn't exist in destination container copy the blob Start-AzureStorageBlobCopy 如果目标容器中不存在,则复制blob Start-AzureStorageBlobCopy

While this works for containers with small size, this takes a very long time and is certainly cost ineffective for containers with say 10 million block blobs because every time I run the task I have to go through all those 10 million blobs. 虽然这适用于体积小的容器,但这需要很长时间,对于容量为1000万块的容器来说肯定是成本效率低的,因为每次运行任务时我都必须经历所有这1000万个blob。

I don't see it in documentation but is there any way i can use conditional headers like DateModifedSince some thing like Get-AzureStorageBlob -DateModifiedSince date in powershell. 我没有在文档中看到它但是有什么方法可以使用像DateModifedSince这样的条件标题,例如Get-AzureStorageBlob -DateModifiedSince date中的Get-AzureStorageBlob -DateModifiedSince date

I have not tried but I can see it is possible to use DateModifiedSince in nodejs library 我没试过,但我可以看到可以在nodejs库中使用DateModifiedSince

Is there anyway I can do it with powershell so that I can be able to use Runbooks ? 无论如何,我可以用powershell做到这一点,以便我能够使用Runbooks

EDIT: 编辑:

Using AzCopy made a copy of storage account that contains 7 million blobs, I uploaded few new blobs and started the azcopy again. 使用AzCopy制作了一个包含700万个blob的存储帐户副本,我上传了一些新blob并再次启动了azcopy。 It still takes significant amount of time to copy few new uploaded files. 复制一些新上传的文件仍需要大量时间。

AzCopy /Source:$sourceUri /Dest:$destUri /SourceKey:$sourceStorageKey /DestKey:$destStorageAccountKey /S /XO /XN /Y

It is possible to filter for a blob with blob name in no time 可以立即过滤具有blob名称的blob

For example Get-AzureStorageBlob -Blob will return the blob immediately from 7 million records 例如, Get-AzureStorageBlob -Blob将立即从700万条记录中返回blob

It should have been possible to filter blob(s) with other properties too.. 应该可以使用其他属性过滤blob。

I am not sure if this would be the actual correct answer but I have resorted to this solution for now. 我不确定这是否是真正正确的答案,但我现在已经采用了这个解决方案。

AzCopy is a bit faster but since it's executable I have no option to use it in Automation. AzCopy有点快,但由于它是可执行的,我没有选择在自动化中使用它。

I wrote my own runbook (can be modified as workflow) which implements following AzCopy command 我编写了自己的Runbook(可以修改为工作流),它实现了以下AzCopy命令

AzCopy /Source:$sourceUri /Dest:$destUri /SourceKey:$sourceStorageKey /DestKey:$destStorageAccountKey /S /XO /Y

  1. Looking at List blobs we can only fiter blobs by blob prefix. 看看列表blob,我们只能通过blob前缀装配blob。 So I cannot pull blobs filtered by Modified date. 所以我无法通过修改日期过滤blob。 This leaves me to pull the whole blob list. 这让我拉出整个blob列表。
  2. I pull 20,000 blobs each time from source and destination Get-AzureStorageBlob with ContinuationToken 我每次使用ContinuationToken从源和目标Get-AzureStorageBlob中提取20,000个blob
  3. Loop through pulled 20,000 source blobs and see if they do not exist in destination or have been modified in source 循环浏览20,000个源blob并查看它们是否在目标中不存在或者是否已在源中进行了修改
  4. If 2 is true then I write those blobs to the destination 如果2为真,那么我将这些blob写入目的地
  5. It takes around 3-4 hours to go through 7 million blobs. 通过700万个斑点大约需要3-4个小时。 Task would prolong depending on how many blobs are to be written to the destination. 任务将延长,具体取决于要将多少blob写入目标。

A code snippet 代码段

    #loop throught the source container blobs, 
    # and copy the blob to destination that are not already there
    $MaxReturn = 20000
    $Total = 0
    $Token = $null
    $FilesTransferred = 0;
    $FilesTransferSuccess = 0;
    $FilesTransferFail = 0;
    $sw = [Diagnostics.Stopwatch]::StartNew();
    DO
    {
        $SrcBlobs = Get-AzureStorageBlob -Context $sourceContext -Container $container -MaxCount $MaxReturn  -ContinuationToken $Token | 
            Select-Object -Property Name, LastModified, ContinuationToken

        $DestBlobsHash = @{}
        Get-AzureStorageBlob -Context $destContext -Container $container -MaxCount $MaxReturn  -ContinuationToken $Token  | 
            Select-Object -Property Name, LastModified, ContinuationToken  | 
                ForEach { $DestBlobsHash[$_.Name] = $_.LastModified.UtcDateTime }


        $Total += $SrcBlobs.Count

        if($SrcBlobs.Length -le 0) { 
            Break;
        }
        $Token = $SrcBlobs[$SrcBlobs.Count -1].ContinuationToken;

        ForEach ($SrcBlob in $SrcBlobs){
            # search  in destination blobs for the source blob and unmodified, if found copy it
            $CopyThisBlob = $false

            if(!$DestBlobsHash.count -ne 0){
                $CopyThisBlob = $true
            } elseif(!$DestBlobsHash.ContainsKey($SrcBlob.Name)){
                $CopyThisBlob = $true
            } elseif($SrcBlob.LastModified.UtcDateTime -gt $DestBlobsHash.Item($SrcBlob.Name)){
                $CopyThisBlob = $true
            }

            if($CopyThisBlob){
                #Start copying the blobs to container
                $blobToCopy = $SrcBlob.Name
                "Copying blob: $blobToCopy to destination"
                $FilesTransferred++
                try {
                    $c = Start-AzureStorageBlobCopy -SrcContainer $container -SrcBlob $blobToCopy  -DestContainer $container -DestBlob $blobToCopy -SrcContext $sourceContext -DestContext $destContext -Force
                    $FilesTransferSuccess++
                } catch {
                    Write-Error "$blobToCopy transfer failed"
                    $FilesTransferFail++
                }   
            }           
        }
    }
    While ($Token -ne $Null)
    $sw.Stop()
    "Total blobs in container $container : $Total"
    "Total files transferred: $FilesTransferred"
    "Transfer successfully: $FilesTransferSuccess"
    "Transfer failed: $FilesTransferFail"
    "Elapsed time: $($sw.Elapsed) `n"

Last modified is stored in the iCloudBlob object, you can access it with Powershell, like this 最后修改后存储在iCloudBlob对象中,您可以使用Powershell访问它,就像这样

$blob = Get-AzureStorageBlob -Context $Context  -Container $container
$blob[1].ICloudBlob.Properties.LastModified

Which will give you 哪个会给你

DateTime : 31/03/2016 17:03:07 日期时间:31/03/2016 17:03:07
UtcDateTime : 31/03/2016 17:03:07 UtcDateTime:31/03/2016 17:03:07
LocalDateTime : 31/03/2016 18:03:07 LocalDateTime:31/03/2016 18:03:07
Date : 31/03/2016 00:00:00 日期:31/03/2016 00:00:00
Day : 31 第31天
DayOfWeek : Thursday DayOfWeek:周四
DayOfYear : 91 DayOfYear:91
Hour : 17 小时:17
Millisecond : 0 毫秒:0
Minute : 3 分钟:3
Month : 3 月:3
Offset : 00:00:00 偏移:00:00:00
Second : 7 第二:7
Ticks : 635950405870000000 蜱虫:635950405870000000
UtcTicks : 635950405870000000 UtcTicks:635950405870000000
TimeOfDay : 17:03:07 TimeOfDay:17:03:07
Year : 2016 年份:2016年

Having a read through the API I don't think it is possible to perform a search on the container with any parameters other than name. 通过API读取我认为不可能使用除name之外的任何参数对容器执行搜索。 I can only imagine that the nodejs library still retrieves all blobs and then filters them. 我只能想象nodejs库仍然检索所有blob然后过滤它们。

I will dig into it a little bit more though 我会更深入地了解它

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Azure Powershell将Azure存储帐户复制到另一个Azure存储帐户 - Azure Powershell to copy azure storage account to another azure storage account Powershell或基于修改的日期批量复制文件夹 - powershell or batch to copy folders based on date modified 根据修改日期复制powershell中的文件 - Copy files in powershell according to the date modified Copy-Item Recursive by *FILE* date modified,但保留文件夹 - Copy-Item Recursive by *FILE* date modified, but keep folder 按修改日期查找特定文件并复制到相关文件夹 - Find specific files by date modified and copy to the relevant folder 如何根据上次修改日期将文件复制到网络驱动器? - How to copy files based on last modified date to network drive? 使用Start-AzureStorageBlobCopy将Blob从Azure复制到本地存储 - Copy Blob from Azure to Local Storage with Start-AzureStorageBlobCopy Powershell:将文件从 Azure 文件共享复制到 Blob 存储帐户 - Powershell: Copy file from Azure File Share to Blob Storage account 如何从azure启动任务中的blob存储中获取复制文件? - How to get a copy file from blob storage in azure startup task? 使用 Powershell 在 Azure 中不同订阅的存储容器之间复制 blob - Copy blobs between storage containers in different subscriptions in Azure using powershell
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM