简体   繁体   中英

Copy Different type of file from Gen1 Azur lake to Azur Gen2 lake with attribute( like last updated)

I need to migrate all my data from Azur data lake Gen1 to Lake Gen2. In my lake we have different types of file mixed (.txt, .zip,.json and many other). We want to move them as-it-is to GEN2 lake. Along with that we also want to maintain last updated time for all files as GEN1 lake.

I was looking to use ADF for this use case. But for that we need to define dataset, and to define dataset we have to define data format(Avro,json,xml, binary etc). As we have different type of data mixed, I tried to use binary format. But with binary format all file at destination have content type "application/octate-stream". Also not able to retain file update time.

As you said, when the files are copied to Data Lake Gen2, all the files properties will be changed, such as 'LAST MODIFIED' time.

Like file uploading, these files are new created in Gen 2, and Azure will create the new properties for them. That's why We can not keep the old property in Gen 1.

When using binary format as the dataset, all the content type is application/octate-stream , we also can not change it.

The property difference between Gen1 and Gen 2(I copied files from Gen 1 to Gen 2): 在此处输入图片说明

Unless we download the 'word.csv' file and re-upload, the content type will change to application/vnd.ms-excel :

在此处输入图片说明

HTH.

Last Modified Time is system metadata that represents that modification in the filesystem/container and it cannot be updated. Adding user meta data to capture meta data from the source is work around and powershell/.net/java sdk can be used for updating additional property. Below the workaround is implemented in PowerShell

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM