简体   繁体   English

将Azure事件中心与Data Lake Store连接

[英]Connect Azure Event Hubs with Data Lake Store

从Event Hubs向Data Lake Store发送数据的最佳方法是什么?

I am assuming you want to ingest data from EventHubs to Data Lake Store on a regular basis. 我假设你想定期从EventHubs到Data Lake Store提取数据。 Like Nava said, you can use Azure Stream Analytics to get data from EventHub into Azure Storage Blobs. 与Nava一样,您可以使用Azure Stream Analytics将数据从EventHub获取到Azure存储Blob中。 Thereafter you can use Azure Data Factory (ADF) to copy data on a scheduled basis from Blobs to Azure Data Lake Store. 此后,您可以使用Azure数据工厂(ADF)按计划将数据从Blob复制到Azure Data Lake Store。 More details on using ADF are available here: https://azure.microsoft.com/en-us/documentation/articles/data-factory-azure-datalake-connector/ . 有关使用ADF的更多详细信息,请访问: https//azure.microsoft.com/en-us/documentation/articles/data-factory-azure-datalake-connector/ Hope this helps. 希望这可以帮助。

== March 17, 2016 update. == 2016年3月17日更新。

Support for Azure Data Lake Store as an output for Azure Stream Analytics is now available. 现在可以支持Azure Data Lake Store作为Azure Stream Analytics的输出。 https://blogs.msdn.microsoft.com/streamanalytics/2016/03/14/integration-with-azure-data-lake-store/ . https://blogs.msdn.microsoft.com/streamanalytics/2016/03/14/integration-with-azure-data-lake-store/ This will be the best option for your scenario. 这将是您的方案的最佳选择。

Sachin Sheth 萨钦谢思

Program Manager, Azure Data Lake Azure Data Lake项目经理

In addition to Nava's reply: you can query data in a Windows Azure Blob Storage container with ADLA/U-SQL as well. 除了Nava的回复:您还可以使用ADLA / U-SQL在Windows Azure Blob存储容器中查询数据。 Or you can use the Blob Store to ADL Storage copy service (see https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-copy-data-azure-storage-blob/ ). 或者,您可以将Blob Store用于ADL存储复制服务(请参阅https://azure.microsoft.com/en-us/documentation/articles/data-lake-store-copy-data-azure-storage-blob/ )。

One way would be to write a process to read messages from the event hub event hub API and writes them into a Data Lake Store. 一种方法是编写一个进程以从事件中心事件中心API读取消息并将它们写入Data Lake Store。 Data Lake SDK . Data Lake SDK

Another alternative would be to use Steam Analytics to get data from Event Hub into a Blob, and Azure Automation to run a powershell that would read the data from the blob and write into a data lake store. 另一种方法是使用Steam Analytics将事件中心的数据转换为Blob,使用Azure自动化运行一个powershell ,它将从blob读取数据并写入数据湖存储。

Not taking credit for this, but sharing with the community: 不赞成这一点,但与社区分享:

It is also possible to archive the Events (look into properties\\archive), this leaves an Avro blob. 也可以归档事件(查看属性\\存档),这留下了Avro blob。

Then using the AvroExtractor you can convert the records into Json as described in Anthony's blob: http://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/ 然后使用AvroExtractor,您可以将记录转换为Json,如Anthony的blob中所述: http ://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/

One of the ways would be to connect your EventHub to Data Lake using EventHub capture functionality (Data Lake and Blob Storage is currently supported). 其中一种方法是使用EventHub捕获功能将EventHub连接到Data Lake(目前支持Data Lake和Blob Storage)。 Event Hub would write to Data Lake every N mins interval or once data size threshold is reached. 事件中心每N分钟间隔或一旦达到数据大小阈值就会写入Data Lake。 It is used to optimize storage "write" operations as they are expensive on a high scale. 它用于优化存储“写入”操作,因为它们在高规模上是昂贵的。

The data is stored in Avro format, so if you want to query it using USQL you'd have to use an Extractor class. 数据以Avro格式存储,因此如果您想使用USQL查询它,则必须使用Extractor类。 Uri gave a good reference to it https://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/ . Uri很好地参考了它https://anthonychu.ca/post/event-hubs-archive-azure-data-lake-analytics-usql/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM