简体   繁体   English

如何在DBFS中本地保存和下载csv?

[英]How to save and download locally csv in DBFS?

I'm trying to save csv file as a result of SQL query, sent to Athena via Databricks.我正在尝试保存 csv 文件作为 SQL 查询的结果,通过 Databricks 发送到 Athena。 The file is supposed to be a big table of about 4-6 GB (~40m rows).该文件应该是一个大约 4-6 GB(约 40m 行)的大表。

I'm doing the next steps:我正在执行以下步骤:

  1. Creating PySpark dataframe by:通过以下方式创建 PySpark dataframe:

     df = sqlContext.sql("select * from my_table where year = 19")
  2. Converting PySpark dataframe to Pandas dataframe.将 PySpark dataframe 转换为 Pandas Z6A8064B5DF479455500553C47C55057 I realize, this step may be unnecessary, but I only start using Databricks and may not know the required commands to do it more swiftly.我意识到,这一步可能是不必要的,但我只是开始使用 Databricks,可能不知道更快地完成所需的命令。 So I do it like this:所以我这样做:

     ab = df.toPandas()
  3. Save the file somewhere to download it locally later:将文件保存在某处以便稍后在本地下载:

     ab.to_csv('my_my.csv')

But how do I download it?但是我怎么下载呢?

I kindly ask you to be very specific as I do not know many tricks and details in working with Databricks.我恳请您非常具体,因为我不知道使用 Databricks 的许多技巧和细节。

Using GUI, you can download full results (max 1 millions rows).使用 GUI,您可以下载完整的结果(最多 1 百万行)。

在此处输入图像描述

To download full results, first save the file to dbfs and then copy the file to local machine using Databricks cli as follows.要下载完整结果,首先将文件保存到 dbfs,然后使用 Databricks cli 将文件复制到本地计算机,如下所示。

dbfs cp "dbfs:/FileStore/tables/my_my.csv" "A:\AzureAnalytics" dbfs cp "dbfs:/FileStore/tables/my_my.csv" "A:\AzureAnalytics"

Reference: Databricks file system参考: Databricks 文件系统

The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. DBFS 命令行界面 (CLI) 使用 DBFS API 向 DBFS 公开易于使用的命令行界面。 Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line.使用此客户端,您可以使用类似于在 Unix 命令行上使用的命令与 DBFS 交互。 For example:例如:

# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana

Reference: Installing and configuring Azure Databricks CLI参考: 安装和配置 Azure Databricks CLI

Hope this helps.希望这可以帮助。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何将 dropbox 上的文件夹下载为 zip 文件并将其保存在本地? - How to download a folder on dropbox as a zip file and save it locally? 如何使用 cloudpickle 将 model 保存到 databricks DBFS 文件夹并加载它? - How to save model with cloudpickle to databricks DBFS folder and load it? 在dbfs(databricks文件存储)中压缩CSV为ZIP - Compress CSV to ZIP in dbfs (databricks file storage) 如何在本地保存几个json? - how to save several json locally? 使用 python 请求下载并保存 CSV - Use python request to download and save CSV 使用 Dask 下载、处理并保存到 csv - Using Dask to download, process, and save to csv 如何从SQL表下载大数据并通过一次获取1000个左右的记录连续保存到csv中 - How to download large data from a SQL table and consecutively save into csv by fetching 1000 or so records at once 如何使用 url 从 csv 文件下载/保存图像到我本地 Windows 机器上的特定创建文件夹中? - How to download/save images from a csv file using url into specific created folder on my local Windows Machine? 如何将pysftp中的csv文件下载到本地计算机? 无法将其保存在本地路径上,或将其加载到df中 - How to download csv file from pysftp to local machine? Can't get it to save on local path, or load it to a df 如何从url下载并保存.csv并使用它在下一个函数中使用python进行处理 - How to download and save the .csv from the url and use it to process in next function using python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM