简体繁体 English

如何同步两个java应用程序？

[英]How can I synchronize two java applications?

原文 2012-01-18 08:40:24 8 2 java/ memory/ synchronization/ distributed

Here is a situation I have encountered: I have two similair java application running on different servers. 这是我遇到的情况：我在不同的服务器上运行了两个similair java应用程序。 Both applications obtain data from the same website using web-service provided. 两个应用程序都使用提供的Web服务从同一网站获取数据。 But the site doesn't know of course that the first app has taken the same peace of data as the second app. 但该网站当然不知道第一个应用程序采用了与第二个应用程序相同的数据安全性。 After fetching data should be saved in database. 获取数据后应保存在数据库中。 So I have a problem of saving the same data two times in a database. 所以我遇到了在数据库中保存两次相同数据的问题。

How can I avoid duplicate entries in my db? 如何避免数据库中的重复条目？

Probably there are two ways: 可能有两种方法：

1) use database side. 1）使用数据库端。 write something that looks like "insert if unique". 写一些看起来像“插入如果唯一”的东西。

2) use server side. 2）使用服务器端。 write some intermediate service that will receive responses from two data fetchers and process them somehow. 编写一些中间服务，它将接收来自两个数据获取者的响应并以某种方式处理它们。

I suppose second solution is more effecient. 我想第二种解决方案更有效。

Can you advice something on this topic? 你能就这个话题提出一些建议吗？ How would you implement that intermediate service? 您将如何实施该中间服务？ How would implement communication between the services? 如何实现服务之间的通信？ If we would use the HashMaps to store received data, how can we estimate maximum size of HashMap that our system can handle? 如果我们使用HashMaps存储接收的数据，我们如何估计我们的系统可以处理的HashMap的最大大小？

2 个解决方案

There are distributed frameworks for this sort of problem. 存在针对此类问题的分布式框架。

Hazelcast - will allow you to have a single distributed ConcurrentMap across multiple JVM's. Hazelcast - 允许您跨多个JVM使用单个分布式ConcurrentMap 。
Terracotta - Using it's DSO (Distributed shared objects I think) it will maintain a Map implementation across JVM;s Terracotta - 使用它的DSO（我认为是分布式共享对象）它将在JVM上维护一个Map实现; s

Do you really need to fetch data at two servers simultaneously? 您真的需要同时在两台服务器上获取数据吗？ Checking every entry during insert if not present could be expensive. 如果不存在，在插入期间检查每个条目可能是昂贵的。 Merging several fetches can be time consuming as well. 合并多次提取也很耗时。 Is there any benefit of fetching in parallel? 并行获取是否有任何好处？ Consider having one fetcher at time. 考虑一次拿一个抓取器。

The problem you will face is that you have to choose which one of you distributed processes should perform data fetching and storing it in DB. 您将面临的问题是您必须选择哪个分布式进程应执行数据获取并将其存储在DB中。

It is some kind of Leader Election problem. 这是某种领导人选举问题。

Take a look at Apache ZooKeeper which is distributed coordination service. 看看Apache ZooKeeper ，它是分布式协调服务。 There is a receipt how to implement leader election with ZooKeeper. 有一个收据如何用ZooKeeper实现领导者选举。

There are a lot of frameworks that already implemented this receipt. 有很多框架已经实现了此收据。 I'd recommend you to use Netflix curator . 我建议你使用Netflix策展人。 More details about the leader election with curator is available at wiki . 有关策展人领导人选举的更多详情，请访问wiki 。