简体   繁体   English

MYSQL从多个数据库同步/数据插入到单个目标数据库

[英]MYSQL sync/data insert from multiple databases to a single target database

One of our application is installed in 50 odd locations. 我们的应用程序之一安装在50个奇数位置。 Each of these has a local database. 每个都有一个本地数据库。 we need to sync data of one of the table of this database to a centalized location. 我们需要将此数据库表之一的数据同步到集中位置。 We want to use a queue mechanism in between the local and the centralized database. 我们想在本地数据库和集中式数据库之间使用队列机制。 So that if there is a network problem, while sync, it will store the recent updates, in the queue, so that whenever the ntwork comes back , it will insert all the records in the remote database, and once done, it will delete the data, inside the temp Q, and the local database can keep on working without any lock even in absense of network. 因此,如果在同步时出现网络问题,它将在队列中存储最近的更新,以便每当ntwork返回时,它将在远程数据库中插入所有记录,一旦完成,它将删除即使在没有网络的情况下,临时Q内的数据和本地数据库也可以保持工作而没有任何锁定。 We cant use a MYSQL sync, because the source is 50 different databases, and the target is one single database , which will have all the records, of all these databases, there is no primary key in my source table. 我们不能使用MYSQL同步,因为源是50个不同的数据库,而目标是一个数据库,它将具有所有这些数据库的所有记录,而源表中没有主键。

Can any one suggest any suitable way for the above problem, Our source and target databases ar MYSQL 任何人都可以提出解决上述问题的任何合适方法吗,我们的源数据库和目标数据库为MYSQL

This is pretty much what I have been doing or a living the past few years, and my gut instinct is that the time to read 500,000 items from the source database and sync in the destination will not take as much time as one might think and the time taken to read the "key" fields, compute the MD5 hash, and cross check with your table to avoid syncing items that haven't changed won't end up saving too much time and may even run longer. 这几乎是我过去几年一直在做的事情或生活,而我的直觉是,从源数据库读取500,000个项目并在目标中进行同步的时间不会像人们想象的那样花费那么多时间,读取“键”字段,计算MD5哈希值以及对表进行交叉检查以避免同步未更改的项目所花费的时间不会节省太多时间,甚至可能会运行更长的时间。 I'd simply read all and update all. 我只是阅读全部并更新所有内容。 If that results in a runtime that is too long, then I'd compress the runtime by making the ETL muti-threaded, with each thread only operating on a segment of the table but working in parallel. 如果这导致运行时太长,那么我将通过使ETL成为多线程来压缩运行时,每个线程仅在表的一部分上运行,但可以并行工作。

It would be important to ensure that your destination database has a primary key index or unique index. 确保目标数据库具有主键索引或唯一索引非常重要。 Otherwise, each of your updates/inserts could lock the entire table. 否则,您的每个更新/插入都可能锁定整个表。 This would be bad if you are taking the multithreaded approach, but important even if you are remaining single-threaded because your job could lock the destination DB table and interfere with the application that rides on top of that DB. 如果您采用多线程方法,这将很糟糕,但是即使您保持单线程,这一点也很重要,因为您的工作可能会锁定目标数据库表并干扰在该数据库之上运行的应用程序。

You say the source DB "may be DB2". 您说源数据库“可能是DB2”。 When you say "may" it implies that DB is still being designed/planned? 当您说“可以”时,表明数据库仍在设计/计划中? DB2 9 or above does have built-in tracking of last update time, and the ability to query and get back only the items that have changed since a point in time. DB2 9或更高版本具有对上次更新时间的内置跟踪,并且能够查询和仅检索自某个时间点以来发生更改的项目。 Perhaps this is why the DB was designed to not have a column indicating the last updated time, eg: 也许这就是为什么数据库被设计为没有一列指示最后更新时间的原因,例如:

SELECT * FROM T1 WHERE ROW CHANGE TIMESTAMP FOR TAB t1 > current timestamp - 1 hours;

The timestamp cutoff for the above query would be the last timestamp your sync ran. 以上查询的时间戳截止时间是同步运行的最后一个时间戳。

If this is the case, that should solve your problem. 如果是这种情况,那应该可以解决您的问题。 But, your solution would end up being tied very tightly to DB2 and in the future they may like to move to another DB platform and expect your sync job to not need to be re-visited. 但是,您的解决方案最终将与DB2紧密地联系在一起,将来他们可能希望移至另一个DB平台,并希望不需要重新访问您的同步作业。 So it would be important to make sure all the right people know that your product will be dependant on remaining on DB2, or if they plan to migrate that migration would include restructuring the DB to have a "last changed timestamp" column, and make whatever changes necessary at the app level to populate that field. 因此,重要的是要确保所有合适的人都知道您的产品将依赖于保留在DB2上,或者如果他们计划进行迁移,则迁移将包括重组数据库以使其具有“最后更改的时间戳”列,并进行任何更改。在应用程序级别进行必要的更改以填充该字段。

Hope it helps! 希望能帮助到你! Thanks 谢谢

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM