简体   繁体   中英

DBFS AZURE Databricks -difference in filestore and DBFS

I am using Azure Databricks with ADLS storage layer.I have a doubt that what is the difference between DBFS and Filestore ? Any idea,what is the max size of a file that can be stored in Filestore? Can we store output files in Filestore and then overwrite them?

Thank you.

DBFS is an abstraction over the cloud storage implementations that allow you to work with files in cloud storage using simple paths instead of full URLs. From documentation:

Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. DBFS is an abstraction on top of scalable object storage and offers the following benefits:

  • Allows you to mount storage objects so that you can seamlessly access data without requiring credentials.
  • Allows you to interact with object storage using directory and file semantics instead of storage URLs.
  • Persists files to object storage, so you won't lose data after you terminate a cluster.

Under the hood, on Azure it uses the same ADLS, so it the same limits should apply ( current limit is 200Tb per file).

PS Please note that there is so-called DBFS Root - created from the storage account that is created automatically during workspace creation, and DBFS mounts to "external" storage accounts. It's generally recommended to use DBFS Root only for temporary files, because if you delete workspace, that storage account will be removed as well.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM