简体   繁体   中英

Is it possible to read a local excel file from within Databricks?

I am able to read an xlsx file in Databricks, but only after uploading the file into blob storage.

The code below works fine:

input_file = pd.read_excel("/dbfs/mnt/container_name/folder_name/input_file.xlsx")

Is there a way of reading an xlsx file directly from a local repository?

Ideally I'm looking for a code similar to below:

input_file = pd.read_excel("file:///C:/Users/XXX111/folder_name/input_file.xlsx")

Receives the error:

URLError: <urlopen error [Errno 2] No such file or directory: '/C:/Users/XXX111/folder_name/input_file.xlsx'>

The location of the file is in C:\Users\XXX111\folder_name.

The short answer: Yes, it's possible but not the way you want, despite not recomendded.

It's quite hard, but:

1 - You need to create the Databrick's workspace in a virtual network and then peer this network with you local one considering all requirements described in the link below:

https://docs.azuredatabricks.net/administration-guide/cloud-configurations/azure/vnet-inject.html

2 - After that you must make the arrangements for the data become reachable in your local network, like a local shareable file system sharepoint, one drive or any FS.

3 - Consequently this will enable the Databrick's VM reach your files through your internal routing.

BUT, the best shot is to provide the data to Databricks through a place where it can natively reach, like anyplace on the cloud, being Azure DataLake, Azure SQL, Storages and on and on.

This is going to bring you some advantages like availability to all your workspaces users 24/7, a better readiness for future deployments of your algorithms and control over who is accessing your data natively using RBAC control or other ways of access control.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM