简体   繁体   English

DBFS AZURE Databricks - 文件存储和 DBFS 的区别

[英]DBFS AZURE Databricks -difference in filestore and DBFS

I am using Azure Databricks with ADLS storage layer.I have a doubt that what is the difference between DBFS and Filestore ?我正在使用带有 ADLS 存储层的 Azure Databricks。我怀疑DBFS 和 Filestore之间有什么区别? Any idea,what is the max size of a file that can be stored in Filestore?任何想法,可以存储在 Filestore中的文件的最大大小是多少? Can we store output files in Filestore and then overwrite them?我们可以将 output 文件存储在 Filestore 中然后覆盖它们吗?

Thank you.谢谢你。

DBFS is an abstraction over the cloud storage implementations that allow you to work with files in cloud storage using simple paths instead of full URLs. DBFS 是对云存储实现的抽象,它允许您使用简单的路径而不是完整的 URL 来处理云存储中的文件。 From documentation:从文档:

Databricks File System (DBFS) is a distributed file system mounted into a Databricks workspace and available on Databricks clusters. Databricks 文件系统 (DBFS) 是安装在 Databricks 工作区中的分布式文件系统,可在 Databricks 集群上使用。 DBFS is an abstraction on top of scalable object storage and offers the following benefits: DBFS 是可扩展 object 存储之上的抽象,具有以下优势:

  • Allows you to mount storage objects so that you can seamlessly access data without requiring credentials.允许您挂载存储对象,以便无需凭据即可无缝访问数据。
  • Allows you to interact with object storage using directory and file semantics instead of storage URLs.允许您使用目录和文件语义而不是存储 URL 与 object 存储进行交互。
  • Persists files to object storage, so you won't lose data after you terminate a cluster.将文件保存到 object 存储,因此您在终止集群后不会丢失数据。

Under the hood, on Azure it uses the same ADLS, so it the same limits should apply ( current limit is 200Tb per file).在引擎盖下,在 Azure 上,它使用相同的 ADLS,因此应该应用相同的限制( 当前限制为每个文件 200Tb)。

PS Please note that there is so-called DBFS Root - created from the storage account that is created automatically during workspace creation, and DBFS mounts to "external" storage accounts. PS 请注意,有所谓的DBFS Root - 从创建工作空间期间自动创建的存储帐户创建,并且 DBFS 安装到“外部”存储帐户。 It's generally recommended to use DBFS Root only for temporary files, because if you delete workspace, that storage account will be removed as well.通常建议仅将 DBFS Root 用于临时文件,因为如果您删除工作区,该存储帐户也将被删除。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM