简体   繁体   English

是否可以从 Databricks 中读取本地 excel 文件?

[英]Is it possible to read a local excel file from within Databricks?

I am able to read an xlsx file in Databricks, but only after uploading the file into blob storage.我能够在 Databricks 中读取 xlsx 文件,但只有在将文件上传到 blob 存储之后。

The code below works fine:下面的代码工作正常:

input_file = pd.read_excel("/dbfs/mnt/container_name/folder_name/input_file.xlsx")

Is there a way of reading an xlsx file directly from a local repository?有没有办法直接从本地存储库读取 xlsx 文件?

Ideally I'm looking for a code similar to below:理想情况下,我正在寻找类似于以下的代码:

input_file = pd.read_excel("file:///C:/Users/XXX111/folder_name/input_file.xlsx")

Receives the error:收到错误:

URLError: <urlopen error [Errno 2] No such file or directory: '/C:/Users/XXX111/folder_name/input_file.xlsx'>

The location of the file is in C:\Users\XXX111\folder_name.该文件的位置在 C:\Users\XXX111\folder_name 中。

The short answer: Yes, it's possible but not the way you want, despite not recomendded.简短的回答:是的,这是可能的,但不是你想要的方式,尽管不推荐。

It's quite hard, but:这很难,但是:

1 - You need to create the Databrick's workspace in a virtual network and then peer this network with you local one considering all requirements described in the link below: 1 - 您需要在虚拟网络中创建 Databrick 的工作区,然后考虑到以下链接中描述的所有要求,将该网络与本地网络对等:

https://docs.azuredatabricks.net/administration-guide/cloud-configurations/azure/vnet-inject.html https://docs.azuredatabricks.net/administration-guide/cloud-configurations/azure/vnet-inject.html

2 - After that you must make the arrangements for the data become reachable in your local network, like a local shareable file system sharepoint, one drive or any FS. 2 - 之后,您必须安排数据在本地网络中变得可访问,例如本地可共享文件系统 sharepoint、一个驱动器或任何 FS。

3 - Consequently this will enable the Databrick's VM reach your files through your internal routing. 3 - 因此,这将使 Databrick 的 VM 通过您的内部路由访问您的文件。

BUT, the best shot is to provide the data to Databricks through a place where it can natively reach, like anyplace on the cloud, being Azure DataLake, Azure SQL, Storages and on and on.但是,最好的办法是通过数据可以本地到达的地方向 Databricks 提供数据,例如云上的任何地方,例如 Azure DataLake、Azure Z9778840A0101CB30C982ZA 和 Storage 上的 4778840A0101CB30C982Z8B76。

This is going to bring you some advantages like availability to all your workspaces users 24/7, a better readiness for future deployments of your algorithms and control over who is accessing your data natively using RBAC control or other ways of access control.这将为您带来一些优势,例如所有工作空间用户 24/7 的可用性,更好地为您的算法的未来部署做好准备,并使用 RBAC 控制或其他访问控制方式控制谁在本机访问您的数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM