简体   繁体   English

是否有获取一些真实世界ETL示例的资源?

[英]Is there a resource for getting some real-world ETL examples?

I'm fully convinced that a significant part of the work I'm doing falls into this domain of ETL, but I didn't even know the term existed before 3 months ago. 我完全相信我正在做的很大一部分工作属于ETL的这个领域,但我甚至不知道3个月之前存在的这个术语。 I've found SSIS to be a bit of a mismatch for my skillset, ie my instincts are that writing C# code in a well thought out way will give me the result I need (also my employer doesn't own it). 我发现SSIS与我的技能组合有点不匹配,即我的直觉是以经过深思熟虑的方式编写C#代码会给我我需要的结果(我的雇主也不拥有它)。 I started looking at WF because if seemed logical, but I came back to the original conclusion that I really need to understand the fundamentals of the problem domain, and when I do that it will make the most sense to leverage my experience and code the solution in .net/c# (I'm a one man team and that doesn't seem to be changing). 我开始关注WF,因为如果看起来合乎逻辑,但我回到最初的结论,我真的需要了解问题域的基础知识,当我这样做时最有意义的是利用我的经验并编写解决方案代码在.net / c#(我是一个单人团队,似乎没有改变)。 So far I have a sort of hodge-podge of syncher utilities, and it was the difficultly that began arising in managing them all that led to seek out this knowledge. 到目前为止,我有一种类似于syncher实用程序的大杂烩,并且很难开始管理它们所有导致了解这些知识。

QUESTION 1 is: is there a resource for me to get some examples of how it's all put together for things like: 问题1是:有没有资源让我得到一些例子,说明如何将它们组合在一起,例如:

  • extracting from REST services with usage limits --> loading to databases for purposes of (as close to) real time (as possible) synchronization 从具有使用限制的REST服务中提取 - >加载到数据库以实现(尽可能接近)实时(尽可能)同步
  • extracting from in-house 3rd party apps like QuickBooks --> loading to databases 从内部第三方应用程序中提取,如QuickBooks - >加载到数据库
  • monitoring for changes in database and updating external systems in carefully tracked batches (ie the same information that was extracted is changed by an LOB app and then needs to be pushed back) 监视数据库中的更改并以仔细跟踪的批次更新外部系统(即,提取的相同信息由LOB应用程序更改,然后需要推回)

QUESTION 2 is: I've yet to grasp where the T part will come into play. 问题2是:我还没有掌握T部分将在何处发挥作用。 Thus far I've been pulling the information that represents logical entities in one system and pushing them into another. 到目前为止,我一直在提取代表一个系统中逻辑实体的信息并将它们推送到另一个系统中。

I don't have any examples of the exact scenarios your looking at, but if you want to learn more about ETL itself, you can try taking a look at the articles on Ayende's site . 我没有您所查看的确切方案的任何示例,但如果您想了解有关ETL本身的更多信息,您可以尝试查看Ayende网站上的文章。 He has an extremely easy to use framework for ETL processes called Rhino ETL . 他有一个非常容易使用的ETL过程框架,名为Rhino ETL And a video showing how to use it . 以及一个显示如何使用它的视频。

As for where the T part comes in to play, the T stands for Transform. 至于T部分的位置,T代表变形。 This is the step in the process where you can (but do not necessarily have to) change the shape of the data. 这是您可以(但不一定必须)更改数据形状的过程中的一个步骤。 After Extracting from one datasource, you can add or remove fields, aggregate information, break objects up in to tables, map tables into objects, etc. This part is the transform step. 从一个数据源中提取后,您可以添加或删除字段,聚合信息,将对象分解为表,将表映射到对象等。这部分是转换步骤。 You then proceed to Load the data in to the new data storage or system. 然后,继续将数据加载到新数据存储或系统中。

Hope that helps some. 希望有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM