[英]How to save and download locally csv in DBFS?
I'm trying to save csv file as a result of SQL query, sent to Athena via Databricks.我正在尝试保存 csv 文件作为 SQL 查询的结果,通过 Databricks 发送到 Athena。 The file is supposed to be a big table of about 4-6 GB (~40m rows).
该文件应该是一个大约 4-6 GB(约 40m 行)的大表。
I'm doing the next steps:我正在执行以下步骤:
Creating PySpark dataframe by:通过以下方式创建 PySpark dataframe:
df = sqlContext.sql("select * from my_table where year = 19")
Converting PySpark dataframe to Pandas dataframe.将 PySpark dataframe 转换为 Pandas Z6A8064B5DF479455500553C47C55057 I realize, this step may be unnecessary, but I only start using Databricks and may not know the required commands to do it more swiftly.
我意识到,这一步可能是不必要的,但我只是开始使用 Databricks,可能不知道更快地完成所需的命令。 So I do it like this:
所以我这样做:
ab = df.toPandas()
Save the file somewhere to download it locally later:将文件保存在某处以便稍后在本地下载:
ab.to_csv('my_my.csv')
But how do I download it?但是我怎么下载呢?
I kindly ask you to be very specific as I do not know many tricks and details in working with Databricks.我恳请您非常具体,因为我不知道使用 Databricks 的许多技巧和细节。
Using GUI, you can download full results (max 1 millions rows).使用 GUI,您可以下载完整的结果(最多 1 百万行)。
To download full results, first save the file to dbfs and then copy the file to local machine using Databricks cli as follows.要下载完整结果,首先将文件保存到 dbfs,然后使用 Databricks cli 将文件复制到本地计算机,如下所示。
dbfs cp "dbfs:/FileStore/tables/my_my.csv" "A:\AzureAnalytics"
dbfs cp "dbfs:/FileStore/tables/my_my.csv" "A:\AzureAnalytics"
Reference: Databricks file system参考: Databricks 文件系统
The DBFS command-line interface (CLI) uses the DBFS API to expose an easy to use command-line interface to DBFS. DBFS 命令行界面 (CLI) 使用 DBFS API 向 DBFS 公开易于使用的命令行界面。 Using this client, you can interact with DBFS using commands similar to those you use on a Unix command line.
使用此客户端,您可以使用类似于在 Unix 命令行上使用的命令与 DBFS 交互。 For example:
例如:
# List files in DBFS
dbfs ls
# Put local file ./apple.txt to dbfs:/apple.txt
dbfs cp ./apple.txt dbfs:/apple.txt
# Get dbfs:/apple.txt and save to local file ./apple.txt
dbfs cp dbfs:/apple.txt ./apple.txt
# Recursively put local dir ./banana to dbfs:/banana
dbfs cp -r ./banana dbfs:/banana
Reference: Installing and configuring Azure Databricks CLI参考: 安装和配置 Azure Databricks CLI
Hope this helps.希望这可以帮助。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.