[英]Data Deduplication In Cloud WIth Java
I am trying to implement a data deduplication program in the cloud using Java. 我正在尝试使用Java在云中实现重复数据删除程序。
I'm not sure how to proceed with the implementation. 我不确定如何继续实施。
First, I wanted to do a simple file compare of the file size, date and name of the file. 首先,我想对文件的大小,日期和名称进行简单的文件比较。 However, this is ineffective since the file might have same content but a different name.
但是,这是无效的,因为文件可能具有相同的内容但名称不同。
I have decided on a simple algorithm which is file upload -> file chunking -> Rabin-karp hashing -> determine to see whether can upload file. 我已经决定了一个简单的算法,即文件上传 - >文件分块 - > Rabin-karp哈希 - >确定是否可以上传文件。
Will this be fine or are there any improvements? 这会没事或有任何改进吗?
Where would I be able to find out more information on this? 我在哪里可以找到更多相关信息? I have tried looking around the Internet but I can't find anything.
我试过环顾互联网,但我找不到任何东西。 Most of it is just broken down into certain implementations but without explanation or details on file chunking or Rabin-karp hashing.
其中大部分内容只是分解为某些实现,但没有关于文件分块或Rabin-karp散列的解释或细节。
I would want to know about which Java libraries I should look into regarding this program. 我想知道关于这个程序我应该研究哪些Java库。
It would be easier if you state your problem constraints. 如果你陈述你的问题限制会更容易。 Assuming the following:
假设如下:
You can probably narrow down your problem. 你可以缩小你的问题范围。
This can be refined depending on the underlying data. 这可以根据基础数据进行细化。
However, this is how I would approach the problem and given the structure of it; 然而,这就是我如何处理问题并给出其结构; this problem can be easily partitioned and solved in a parallel manner.
这个问题可以很容易地分区并以并行方式解决。 Feel free to elaborate more so that we can reach a good solution.
随意详细说明,以便我们能够找到一个好的解决方案。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.