简体   繁体   English

如何在多个二进制文件中找到最长的公共子序列?

[英]How to find longest common subsequence in multiple binary files?

I am given 10 binary files and I have to find the longest strand of bytes that is identical between two or more files.我有 10 个二进制文件,我必须找到两个或多个文件之间相同的最长字节链。 Any help is appreciated.任何帮助表示赞赏。 Thanks in advance.提前致谢。

  • A brute force approach would be to compare each file to any other file, which would require 10 * 9 = 90 comparisons.蛮力方法是将每个文件与任何其他文件进行比较,这需要 10 * 9 = 90 次比较。

  • And for comparing any two files, you could just run through them bytewise and compare if they are equal, and then along the way you store the longest sequence found so far.为了比较任何两个文件,您可以按字节运行它们并比较它们是否相等,然后一路存储迄今为止发现的最长序列。 Any time a sequence breaks, you start a new temporary sequence and only store it when it is longer than the original.每当序列中断时,您都会启动一个新的临时序列,并且仅在它比原始序列长时才存储它。

  • Another but somewhat similar approach is to use dynamic programming for the longest common subsequence (LCS) but requires more memory than the previous so depends on the size of the files and etc., but for this approach, there are plenty of resources with graphic visualizations and pseudo-code of the algorithm.另一种但有些相似的方法是对最长公共子序列 (LCS) 使用动态编程,但需要比以前更多的 memory,因此取决于文件的大小等,但对于这种方法,有大量的图形可视化资源和算法的伪代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM