从unix命令行连接两个文件的最简单方法，为缺少的键插入零条目

Question

I'm trying to join two files each of which contains rows of the form <key> <count> . 我正在尝试连接两个文件，每个文件包含<key> <count>形式的行。 Each file contains a few lines that are missing from the other, and I would like to have zero inserted for all such values rather than omitting these lines (I've seen -a, but this isn't quite what I'm looking for). 每个文件包含一些从另一个文件中缺失的行，我想为所有这些值插入零而不是省略这些行（我见过-a，但这不是我想要的）。 Is there a simple way to accomplish this? 有没有一种简单的方法来实现这一目标？

Here is some sample input: 以下是一些示例输入：

a.txt A.TXT

apple 5
banana 7

b.txt b.txt

apple 6
cherry 4

expected output: 预期产量：

apple 5 6
banana 7 0
cherry 0 4

Answer 1

join -o 0,1.2,2.2 -e 0 -a1 -a2 a.txt b.txt

-o 0,1.2,2.2 → output join field, then 2nd field of 1st file, then 2nd field of 2nd file. -o 0,1.2,2.2 →输出连接字段，然后是第一个文件的第二个字段，然后是第二个文件的第二个字段。
-e 0 → Output 0 on empty input fields. -e 0 →空输入字段输出0 。
-a1 -a2 → Show all values from file 1 and file 2. -a1 -a2 →显示文件1和文件2中的所有值。

Answer 2

Write a script, whatever language you want. 写一个脚本，无论你想要什么语言。 You will parse both files using a map/hashtable/dictionary data structure (lets just say dictionary). 您将使用map / hashtable / dictionary数据结构解析这两个文件（简单地说就是字典）。 Each dictionary will have the first word as the key and the count (or even a string of counts) as the value. 每个字典都将第一个单词作为键，计数（甚至是一串计数）作为值。 Here is some pseudocode of the algorithm: 这是算法的一些伪代码：

Dict fileA, fileB; //Already parsed
while(!fileA.isEmpty()) {
      string check = fileA.top().key();
      int val1 = fileA.top().value();
      if(fileB.contains(check)) {
          printToFile(check + " " + val1 + " " + fileB.getValue(check));
          fileB.remove(check);
      }
      else {
          printToFile(check + " " + val1 + " 0");
      }
      fileA.pop();
}
while(!fileB.isEmpty()) {      //Know key does not exist in FileA
     string check = fileB.top().key();
     int val1 = fileB.top().value();
     printToFile(check + " 0 " + val1);
     fileB.pop();
}

You can use any type of iterator to go through the data structure instead of pop and top. 您可以使用任何类型的迭代器来遍历数据结构而不是pop和top。 Obviously you may need to access the data a different way depending on what language/data structure you need to use. 显然，您可能需要以不同的方式访问数据，具体取决于您需要使用的语言/数据结构。

Answer 3

@ninjalj's answer is much saner, but here's a shell script implementation just for fun: @ninjalj的答案非常合理，但这里有一个shell脚本实现只是为了好玩：

exec 8< a.txt
exec 9< b.txt

while true; do
   if [ -z "$k1" ]; then
    read k1 v1 <& 8
   fi
   if [ -z "$k2" ]; then
    read k2 v2 <& 9
   fi
   if [ -z "$k1$k2" ]; then break; fi
   if [ "$k1" == "$k2" ]; then
    echo $k1 $v1 $v2 
    k1=
    k2=
   elif [ -n "$k1" -a "$k1" '<' "$k2" ]; then
    echo $k1 $v1 0 
    k1=
   else 
    echo $k2 0 $v2
    k2=
   fi
done

从unix命令行连接两个文件的最简单方法，为缺少的键插入零条目

问题描述

3 个解决方案

解决方案1
11 已采纳 2011-10-25 20:47:19

解决方案2
0 2011-10-25 20:52:57

解决方案3
0 2011-10-25 21:06:12

从unix命令行连接两个文件的最简单方法，为缺少的键插入零条目

问题描述

3 个解决方案

解决方案1 11 已采纳 2011-10-25 20:47:19

解决方案2 0 2011-10-25 20:52:57

解决方案3 0 2011-10-25 21:06:12

解决方案1
11 已采纳 2011-10-25 20:47:19

解决方案2
0 2011-10-25 20:52:57

解决方案3
0 2011-10-25 21:06:12