简体   繁体   English

我在 blob 存储中获得连续的 blob 文件。 我必须加载 Databricks 并放入 Azure SQL DB。 用于编排此管道的数据工厂

[英]I'm getting continuous blob files in blob storage. I have to load in Databricks and put in Azure SQL DB. Data factory for orchestrating this pipeline

I receive data continuously in blob storage.我在 blob 存储中连续接收数据。 I have initially 5 blob files in the blob storage I'm able to load from blob to Azure SQL DB using Databricks and automated it using Data factory, But the problem is when newer files come in blob storage the databricks loads these files along with older files and sends it into Azure SQL DB.我最初在 blob 存储中有 5 个 blob 文件,我可以使用 Databricks 从 blob 加载到 Azure SQL DB 并使用数据工厂将其自动化,但问题是当新文件进入 blob 存储时,databricks 将这些文件与旧文件一起加载文件并将其发送到 Azure SQL DB。 I don't want these old files, every time I want only the newer one's, so that same data is not loaded again and again in the Azure SQL DB.我不想要这些旧文件,每次我只想要更新的文件,这样相同的数据就不会一次又一次地加载到 Azure SQL DB 中。

Easiest way to do that is to simply archive the file that you just read into a new folder name it archiveFolder .最简单的方法是简单地将刚刚读入的文件归档到一个名为archiveFolder的新文件夹中。 Say, your databricks is reading from the following directory:说,你的数据块正在从以下目录读取:

mnt
  sourceFolder
    file1.txt
    file2.txt
    file3.txt

You run your code, you ingested the files and loaded them in SQL server.您运行您的代码,摄取文件并将它们加载到 SQL 服务器中。 Then what you can simply do is to archive these files (move them from the sourceFolder into archiveFolder . This can simply be done in databricks using the following command然后你可以简单地做的是归档这些文件(将它们从sourceFolder移动到archiveFolder 。这可以使用以下命令在数据块中简单地完成

dbutils.fs.mv(sourcefilePath, archiveFilePath, True)

So, next time your code runs, you will only have the new files in your sourceFolder .因此,下次您的代码运行时,您的sourceFolder中将只有新文件。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 azure 数据块 scala 将数据从 blob 存储加载到 sql 数据仓库 - Data load from blob storage to sql data warehouse using azure databricks scala 我无法访问 Azure blob 存储 - I'm unable to access Azure blob storage 将最新的文件夹从 azure blob 存储加载到 azure 数据工厂 - Load the latest folder from azure blob storage to azure data factory 如何使用数据工厂将数据从 Azure Blob 存储增量加载到 Azure SQL 数据库? - How to incrementally load data from Azure Blob storage to Azure SQL Database using Data Factory? Map Azure Blob 存储文件到 Azure 数据工厂中的自定义活动 - Map Azure Blob Storage files to a custom activity in Azure Data Factory 如何在没有 Azure 数据工厂的情况下将 csv 文件从 blob 存储加载到 azure sql 数据库 - How to load csv file from blob storage to azure sql database without Azure Data Factory 将 Azure blob 存储文件名保存到 Sql 表 - Azure 数据工厂 - Save Azure blob storage filename to Sql table - Azure Data Factory 如何检查 blob 存储上的旧备份 (SQL SERVER)。 我想检查容器中是否存在最近一年的备份文件 - how to check old backup (SQL SERVER) on blob storage. I want to check whether last one year backup files present in Container or not Azure 数据工厂到 Azure Blob 存储权限 - Azure Data Factory to Azure Blob Storage Permissions 当源数据集是 Blob 存储时,我可以在 azure 数据工厂中一次填充不同的 SQL 表吗? - Can i populate different SQL tables at once inside azure data factory when the source data set is Blob storage?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM