简体   繁体   English

比较两个平面文件内容的最佳方法

[英]Best way to compare the content of two flat files

We are having lot of | 我们有很多| (pipe) separated flat files, which we process on daily basis in SQL Server using a SSIS package. (管道)分隔的平面文件,我们每天在SQL Server中使用SSIS包处理它们。 Each flat file are divided into header section, content section and footer section. 每个平面文件分为标题部分,内容部分和页脚部分。 We regularly get newer version of the same files. 我们经常会获得相同文件的更新版本。 We are trying to implement file comparison functionality between two versions of same file, to reduce the load of processing. 我们正在尝试在同一文件的两个版本之间实现文件比较功能,以减少处理负担。

Which method will be more efficient ? 哪种方法更有效?

  1. Storing both versions of same file into separate SQL Server tables with checksum column and filter out rows for which checksum values are not matching. 将具有校验和列的同一文件的两个版本存储到单独的SQL Server表中,并过滤掉校验和值不匹配的行。

  2. Implementing the similar checksum logic in C# or any other comparison algorithm available in C#. 在C#中实现类似的校验和逻辑或C#中可用的任何其他比较算法。

You may suggest any other new algorithm to achieve the same. 您可以建议任何其他新算法来实现相同的目标。

Well, if you are loading both of these into SQL Server already, then a fast way would be using EXCEPT() or INTERSECT() depending on what your goal is. 好吧,如果你已经将这两个加载到SQL Server中,那么快速的方法是使用EXCEPT()INTERSECT()具体取决于你的目标。

select * from version2
except
select * from version1

This would return rows in version2 that didn't exactly match the rows in version1 . 这将返回行version2未精确地匹配行version1 You could also only select a single column if you want to compare off that. 如果要比较,也可以只选择一列。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM