简体   繁体   English

从 C# 中的 Azure 数据湖读取数据

[英]Reading data from Azure Data Lake in C#

I have a requirement to save a large amount (>100GB per day) of transactional data to a data lake gen2.我需要将大量(每天 >100GB)事务数据保存到数据湖 gen2。 The data is many small JSON transactions so I was planning to batch the transactions together into logical file groups to avoid creating lots of small files.数据是许多小的 JSON 事务,所以我计划将事务一起批处理到逻辑文件组中,以避免创建大量小文件。 This will allow data analysis to occur over the entire dataset.这将允许对整个数据集进行数据分析。

However, I also have a separate requirement to retrieve individual transactions from a c# app.但是,我还需要从 c# 应用程序中检索单个事务。 Is that possible?那可能吗? There doesn't seem to be an appropriate method on the REST API, and the USQL examples that I've found don't seem to be exposed to c# apps in any way. REST API 上似乎没有合适的方法,而且我发现的 USQL 示例似乎没有以任何方式暴露于 c# 应用程序。

Maybe I'm trying to use data lake for the wrong purpose but I don't want to save this quantity of data twice if I can help it.也许我试图将数据湖用于错误的目的,但如果我能提供帮助,我不想将这么多的数据保存两次。

Thanks!谢谢!

This solution will allow T-SQL queries against all you JSON files此解决方案将允许对所有 JSON 文件进行 T-SQL 查询

  1. Create a Data Factory to Read JSON files and output parquet formatted files.创建数据工厂以读取 JSON 文件和 output parquet 格式文件。
  2. Use Azure Synapse Workspace On-Demand to read Parque files with OPENROWSET pointing to the Azure Storage location of the parquet files使用 Azure Synapse Workspace On-Demand 读取 Parque 文件,其中 OPENROWSET 指向 Azure Parquet 文件的存储位置
  3. Synapse Workspace On-Demand create a SQL Server Login for C# App Synapse Workspace On-Demand 创建 SQL 服务器登录 C# 应用程序
  4. Use ADO.NET to send SQL Commands from C#使用 ADO.NET 从 C# 发送 SQL 命令

Synapse Workspace On-Demand create a SQL Server Login for C# App Synapse Workspace On-Demand 创建 SQL 服务器登录 C# 应用程序

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM