简体   繁体   English

比较基于多个列的两个csv文件并保存在单独的文件中

[英]Comparing two csv files based on multiple columns and save in separate file

I have two files with same format where one has new updates and the other has older updates. 我有两个格式相同的文件,其中一个具有新的更新,另一个具有较旧的更新。 There is no particular unique id column. 没有特定的唯一ID列。

How can I extract the new updated lines only (with unix, PHP, AWK)? 如何仅提取新的更新行(使用UNIX,PHP,AWK)?

You want to "byte" compare all lines against the other lines, so i would do: 您想“字节”比较所有行与其他行,所以我会这样做:

$lines1 = file('file1.txt');
$lines2 = file('file2.txt');

$lookup = array();

foreach($lines1 as $line) {
  $key = crc32($line);
  if (!isset($lookup[$key])) $lookup[$key] = array();
  $lookup[$key][] = $line;
}

foreach($lines2 as $line) {
  $key = crc32($line);

  $found = false;
  if (isset($lookup[$key])) {
    foreach($lookup[$key] as $lookupLine) {
      if (strcmp($lookupLine, $line) == 0) {
        $found = true;
        break;
      }
    }
  }

  // check if not found
  if (!$found) {
    // output to file or do something
  }
}

Note that if the files are very large this will consume quite some memory and you need to use some other mechanism, but the idea stays the same 请注意,如果文件很大,这将消耗相当多的内存,并且您需要使用其他机制,但是想法保持不变

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM