简体   繁体   中英

How to copy data to Azure Data Lake store using multiple threads from azure file share?

I have 2TB data in azure file share and I want that to be copied to azure data lake store. That should get copied with same directory structure. I tried with

az dls fs upload --account eanpdlstore2 --source-path "/root/mymountpoint/ShopperVisionDataRoot/" --destination-path "/pngcaseprocessing/ShopperVisionDataRoot"

But it is taking forever to copy the data to azure data lake from azure file share. Can some shed some light on how to get this work or any other feasible way to do this?

The dir structure is like Main_dir/sub_dir/sub_dir/{multiple_data_folders} and should get copied same way. The point is I don't want to copy all the data folders under Main_dir/sub_dir/sub_dir/ but only two in each sub_dir. So I tried this to get those two to different location.

find DATA_PREP_INPUT2 -maxdepth 1 -mindepth 1 -type d | while IFS= read -r subdir; do
  mkdir -p DATA_PREP_INPUT_TEST/"$(basename $subdir)" &&
  cp -n -r "$subdir"/{IPD_130288,IPD_130284} DATA_PREP_INPUT_TEST/"$(basename $subdir)"/;
done

and then I can use the above azure command to copy. But this is also taking too long to copy one data dir.

As an alternative you may wish to consider Azure Data Factory, if you aren't hitting limits for azure file shares.

It does a great job at copying between data sources, it can run concurrently and the execution location is automatically closest to your data sink (in most cases - it can be overridden if needs be).

It has support for Azure File Shares: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-file-storage

And data lake store: https://docs.microsoft.com/en-us/azure/data-factory/connector-azure-data-lake-store

Here are some tips for performance tuning: https://docs.microsoft.com/en-us/azure/data-factory/copy-activity-performance

I don't believe at the time of writing the "copy wizard" (which makes it easier to experiment) is available in V2, but it's the version you need for for Azure File Shares.

Here's a getting stared guide: https://docs.microsoft.com/en-us/azure/data-factory/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM