Joining multiple unsorted text files

Question

I have a number of single-column text files containing unsorted values. The aim is to join them, however, the "join" utility of linux requires the files to be sorted. Any idea how to do that without sorting?

A.txt

B.txt

C.txt

Desired Output:

0000;
0003;

Answer 1

To overcome the "number of files beforehand" and the "repetitive elements" problems of twalberg's fine awk program, I'd use the more verbose:

#!/usr/bin/python2

from sys import argv

# collect all lines from each file in their own set

sets = []
for path in argv[1:]:
    with open(path) as infile:
        s = set(infile.readlines())
        sets.append(s)

# find the common items in all sets

common = sets[0]
for s in sets[1:]:
    common = common.intersection(s)

# print the common items in the order they appear in the
# first file

with open(argv[1]) as infile:
    for line in infile:
        if line in common:
            common.remove(line) # prevents duplicates
            print line,

Answer 2

This, I believe, requires GNU awk for the multidimensional array:

gawk '
    FNR == 1 {nfiles++}
    {seen[$1][FILENAME] = 1} 
    END {for (item in seen) if (length(seen[item]) == nfiles) print item}
' A.txt B.txt C.txt

0000;
0003;

Answer 3

TXR Lisp solution:

(defvar hash-list
  (collect-each ((a *args*))
    (hash-construct '(:equal-based) (zip (get-lines (open-file a))))))

(if hash-list
  (dohash (key val [reduce-left hash-isec hash-list])
    (put-line key)))

$ txr join.tl
$ txr join.tl A.txt
0000;
0001;
0002;
0003;
$ txr join.tl A.txt B.txt C.txt
0000;
0003;

Joining multiple unsorted text files

Question

3 answers

solution1
0 ACCPTED 2014-04-29 19:08:21

solution2
0 2014-04-29 19:26:46

solution3
0 2014-07-15 00:14:22

Joining multiple unsorted text files

Question

3 answers

solution1 0 ACCPTED 2014-04-29 19:08:21

solution2 0 2014-04-29 19:26:46

solution3 0 2014-07-15 00:14:22

solution1
0 ACCPTED 2014-04-29 19:08:21

solution2
0 2014-04-29 19:26:46

solution3
0 2014-07-15 00:14:22