[英]Accessing Azure DevOps Git file directly from Azure Databricks
We have a CSV file stored in a ADO (Azure DevOps) Git repository.我们有一个 CSV 文件存储在 ADO (Azure DevOps) Git 存储库中。 I have Azure Databricks cluster running, and in the workspace I have a python code to read and transform this CSV file into a spark dataframe. I have Azure Databricks cluster running, and in the workspace I have a python code to read and transform this CSV file into a spark dataframe. But every time the file undergoes change, I have to manually download it from ADO Git and upload to the Databricks workspace.但是每次文件发生更改时,我都必须从 ADO Git 手动下载并上传到 Databricks 工作区。 I use the following command to verify that the file has been uploaded:-我使用以下命令来验证文件是否已上传:-
dbutils.fs.ls ("/FileStore/tables")
It lists my file.它列出了我的文件。 I then use the following Python code to convert this CSV to Spark dataframe:然后我使用以下 Python 代码将此 CSV 转换为 Spark dataframe:
file_location = "/FileStore/tables/MyFile.csv"
file_type = "csv"
# CSV options
infer_schema = "true"
first_row_is_header = "true"
delimiter = ","
# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", first_row_is_header) \
.option("sep", delimiter) \
.load(file_location)
So there is this manual step involved every time the file in the ADO Git repository changes.因此,每次 ADO Git 存储库中的文件更改时,都会涉及此手动步骤。 Is there any Python function using which I can directly point to the copy of the file in the master branch of the ADO Git?是否有任何 Python function 使用它可以直接指向 ADO Git 的主分支中的文件副本?
You have 2 choices, depending on what would be simpler for you:您有 2 个选择,具体取决于对您来说更简单的方法:
dbutils.fs.cp
to copy file from driver node into /FileStore/tables
因为这个文件只能从驱动节点访问,所以你需要使用dbutils.fs.cp
将文件从驱动节点复制到/FileStore/tables
databrics fs cp...
command) to copy file directly into DBFS.在 Git 存储库中设置构建管道,该管道仅在提交特定文件时触发,如果发生更改,请使用Databricks CLI ( databrics fs cp...
命令)将文件直接复制到 DBFS。 Here is an example that not doing exactly what you want, but it could be used as inspiration.这是一个没有完全按照您的意愿行事的示例,但它可以用作灵感。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.