简体   繁体   中英

Best way to compare the content of two flat files

We are having lot of | (pipe) separated flat files, which we process on daily basis in SQL Server using a SSIS package. Each flat file are divided into header section, content section and footer section. We regularly get newer version of the same files. We are trying to implement file comparison functionality between two versions of same file, to reduce the load of processing.

Which method will be more efficient ?

  1. Storing both versions of same file into separate SQL Server tables with checksum column and filter out rows for which checksum values are not matching.

  2. Implementing the similar checksum logic in C# or any other comparison algorithm available in C#.

You may suggest any other new algorithm to achieve the same.

Well, if you are loading both of these into SQL Server already, then a fast way would be using EXCEPT() or INTERSECT() depending on what your goal is.

select * from version2
except
select * from version1

This would return rows in version2 that didn't exactly match the rows in version1 . You could also only select a single column if you want to compare off that.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM