简体   繁体   中英

Azure Data Lake Storage Gen2 (ADLS Gen2) as a data source for Kedro pipeline

According to Kedro's documentation , Azure Blob Storage is one of the available data sources. Does this extend to ADLS Gen2 ?

Haven't tried Kedro yet, but before I invest some time on it, I wanted to make sure I could connect to ADLS Gen2.

Thank you in advance !

Yes this works with Kedro. You're actually pointing a really old version of the docs, nowadays all filesystem based datasets in Kedro use fsspec under the hood which means they work with S3, HDFS, local and many more filesystems seamlessly.

The ADLS Gen2 is supported by ffspec via the underlying adlfs library which is documented here .

From a Kedro point of view all you need to do is declare your catalog entry like so:

 motorbikes:
     type: pandas.CSVDataSet
     filepath: abfs://your_bucket/data/02_intermediate/company/motorbikes.csv
     credentials: dev_az

We also have more examples here , particularly example 15.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM