简体   繁体   English

MySQL到WAN上的SQL Server的ETL机制

[英]ETL mechanisms for MySQL to SQL Server over WAN

I'm looking for some feedback on mechanisms to batch data from MySQL Community Server 5.1.32 with an external host down to an internal SQL Server 05 Enterprise machine over VPN. 我正在寻找有关通过MySQL将数据从MySQL Community Server 5.1.32与外部主机批量分发到内部SQL Server 05 Enterprise计算机的机制的一些反馈。 The external box accumulates data throughout business hours (about 100Mb per day), which then needs to be transferred internationally across a WAN connection (quality not yet determined but it's not going to be super fast) to an internal corporate environment before some BI work is performed. 外部存储盒会在整个工作时间内(每天约100Mb)累积数据,然后需要通过WAN连接(质量尚未确定,但不会很快)在全球范围内进行国际传输,然后再进行某些BI工作。执行。 This should just be change-sets making their way down each night. 这应该只是每晚的变更集。

I'm interested in thoughts on the ETL mechanisms people have successfully used in similar scenarios before. 我对人们以前在类似情况下成功使用过的ETL机制的想法感兴趣。 SSIS seems like a potential candidate; SSIS似乎是一个潜在的候选人。 can anyone comment on the suitability for this scenario? 有人可以评论这种情况的适用性吗? Alternatively, other thoughts on how to do this in a cost-conscious way would be most appreciated. 或者,将以其他方式了解如何以节省成本的方式实现此目的。 Thanks! 谢谢!

It depends on the use you have of the data received from the external machine. 这取决于您是否使用了从外部计算机接收到的数据。

If you must have the data for the calculations of the morning after or do not have confidence in your network, you would prefer to loose-couple the two systems and enable some message-queuing between them so that if something fails during the night like the DBs, the networks links, anything that would be a pain for you to recover, you can start every morning with some data. 如果您必须掌握第二天早上的计算数据,或者对网络不信任,则最好将这两个系统松耦合,并在它们之间启用一些消息队列,这样,如果晚上出现故障,例如数据库,网络链接以及任何可能使您难以恢复的事情,您可以每天早晨从一些数据开始。

If the data retrieval is not subject to a high degree of criticality, any solution is good :) 如果数据检索不受严格程度限制,那么任何解决方案都是不错的方法:)

Regarding SSIS, it's just a great ETL framework (yes, there's a subtlety :)). 关于SSIS,这只是一个很棒的ETL框架(是的,有一个微妙的:))。 But I don't see it as a part of the data transfer, rather in the ETL part when your data has been received or is still waiting in the message-queing system. 但是我不认为它是数据传输的一部分,而是在ETL部分,当您的数据已被接收或仍在消息查询系统中等待时。

First, if you are going to do this, have a good way to easily see what has changed since the last time. 首先,如果要执行此操作,请使用一种好方法轻松查看自上次以来发生的变化。 Every field should have a last updatedate or a timestamp that changes when the record is updated (not sure if mysql has this). 每个字段都应有一个最后更新日期或一个时间戳,该记录在记录更新时会更改(不确定mysql是否具有此日期)。 This is far better than comparing every single field. 这远胜于比较每个字段。

If you had SQL Server in both locations I would recommend replication, is it possible to use SQL server instead of mySQL? 如果在两个位置都装有SQL Server,我建议您进行复制,是否可以使用SQL Server代替mySQL? If not then SSIS is your best bet. 如果没有,那么SSIS是您最好的选择。

In terms of actually getting your data from MySQL into SQL Server, you can use SSIS to import the data using a number of methods. 就将数据从MySQL实际获取到SQL Server而言,可以使用SSIS通过多种方法导入数据。 One would be to connect directly to your MySQL source (via an OLEDB Connection or similar) or you could do a daily export from MySQL to a flat file and pick this up using a FTP Task. 一种方法是直接连接到您的MySQL源(通过OLEDB连接或类似工具),或者您可以每天从MySQL导出到平面文件,然后使用FTP任务将其提取。 Once you have the data, SSIS can perform the required transforms before loading the processed data to SQL Server. 获得数据后,SSIS可以执行所需的转换,然后再将处理后的数据加载到SQL Server。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM