简体   繁体   English

从数据湖重命名 Azure Databricks 中的文件时出现问题

[英]Problem when rename file in Azure Databricks from a data lake

I am trying to rename a file with Python in Azure Databricks through the "import os" library using the "rename ()" function, it is something very simple really, but when doing it in Databricks I can't get to the path where my file is.我正在尝试使用“rename ()”函数通过“import os”库在 Azure Databricks 中使用 Python 重命名文件,这确实非常简单,但是在 Databricks 中执行此操作时,我无法到达路径我的文件是。 in the Data Lake, but doing a command "% fs ls path_file" yes I see it, I can even read it and process it with pyspark without problems.在数据湖中,但是执行命令“% fs ls path_file”是的,我看到了,我什至可以读取它并使用 pyspark 处理它而不会出现问题。

I leave an example of my code:我留下我的代码示例:

import os
old_name = r"/mnt/datalake/path/part-00000-tid-1761178-3f1b0942-223-1-c000.csv"
new_name = r"/mnt/datalake/path/example.csv"

os.rename(old_name, new_name)

The above returns an error that does not find the path or file, but an "ls" command does that same path with out problem.上面返回找不到路径或文件的错误,但“ls”命令执行相同的路径而没有问题。

On the other hand, I have tried to rename the file with pySpark, but it uses a hadoop library (org.apache.hadoop.conf.Configuration) that I do not have installed and I cannot install it in the production environment ...另一方面,我试图用 pySpark 重命名文件,但它使用了一个我没有安装的 hadoop 库(org.apache.hadoop.conf.Configuration),我无法在生产环境中安装它......

What would I be missing?我会错过什么?

if you're using os.rename , you need to refer files as /dbfs/mnt/... because you're using local API to access DBFS .如果您使用os.rename ,则需要将文件引用为/dbfs/mnt/...因为您使用本地 API 来访问 DBFS

But really, it could be better to use dbutils.fs.mv to do file renaming:但实际上,使用dbutils.fs.mv进行文件重命名可能会更好:

old_name = r"/mnt/datalake/path/part-00000-tid-1761178-3f1b0942-223-1-c000.csv"
new_name = r"/mnt/datalake/path/example.csv"

dbutils.fs.mv(old_name, new_name)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从Azure Data Lake Store中读取Azure Databricks中的JSON文件 - How to read a JSON file in Azure Databricks from Azure Data Lake Store 从Azure Data Lake将平面文件加载到数据框中时出现问题 - Problem loading a flat file into a data frame from Azure Data Lake 使用python和scala重命名Azure / databricks的输出文件 - Using python and scala to rename output file from azure/databricks 从 Azure Data Lake Storage Gen 2 读取 CSV 到 Pandas Dataframe | 没有数据块 - Read CSV from Azure Data Lake Storage Gen 2 to Pandas Dataframe | NO DATABRICKS 如何将 SQL 查询的结果从 Databricks 导出到 Azure Data Lake Store - How to Export Results of a SQL Query from Databricks to Azure Data Lake Store 当您知道文件类型但不知道名称时,如何从 Azure Data Lake 下载文件? - How do you download a file from Azure Data Lake when you know the type of the file but not the name? 使用 Databricks 在 Apache Spark 中安装 Azure 数据湖时出错 - Error Mounting Azure Data Lake in Apache Spark using Databricks 重命名写入的 CSV 文件 Spark 抛出错误“路径必须是绝对的”-Azure Data Lake - Rename written CSV file Spark throws Error “Path must be absolute” - Azure Data Lake 解析从Azure数据湖下载的json文件 - Parse json file downloaded from Azure data lake 将文件从 Azure 文件加载到 Azure Databricks - Load file from Azure Files to Azure Databricks
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM