简体   繁体   中英

Download large file from Databricks DBFS with Python API

I'm trying to download some large files from Databricks storage using their DBFS API but i'm only getting a portion of the file under 1 MB (single API call size restriction).
There is an example of large file upload which uses session with a handle. I assume I would need something like that but I can't wrap my head around it.
The read functionality doesn't have a handle but has offset argument. I assume I could make a loop with incremental offset to pull 1 MB per call but it doesn't sound like an optimal solution. Moreover, when I try to do that I still get files ~520 KB of size.

Scenario 1

Using databricks portal you can directly download up to (1 milion rows)参考1

Scenario 2

Install azure databricks cli and configure with azure databricks.Use this command dbfs cp <file_to_download> <local_filename> and download file. You can use DBFS API- 2.0 with unix command line interface CLI

Reference: Access DBFS with azure databricks.

Scenario 3

Direcltly download DBFS files with Web URL

Sample Final URL:

  https://adb-87xxxxxxxxx.9.azuredatabricks.net/files/tables/dd.csv/?o=8xxxxxxxxxxxx

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM