简体   繁体   中英

Join two files on Linux

I have two files; I want to join them.

$cat t1
 1 1.2
 2 2.2
$cat t2
 1
 2
 1

I want to have the output below

$cat joind.txt
 1 1.2
 2 2.2
 1 1.2

but when I use the join command, the third line does not appear in the output.

A simple awk is suffice for this:

awk 'FNR==NR{a[$1]=$2;next} {print $1, a[$1]}' t1 t2
1 1.2
2 2.2
1 1.2

Breakup:

NR == FNR {                  # While processing the first file
  a[$1] = $2                 # store the second field by the first
  next                       # move to next record in 1st file
}
{                            # while processing the second file
  print $1, a[$1]            # print $1 and the remembered
                             # value from the first file.
}

join requires that both files to be sorted. If you sort them first, you'll get all your output

$ sort t1 > t1.sorted
$ sort t2 > t2.sorted
$ join -j1 -o 1.1,1.2 t1.sorted t2.sorted
1 1.2
1 1.2
2 2.2

Without the sort:

$ join -j1 -o 1.1,1.2 t1 t2
1 1.2
2 2.2

This assumes that the order of your inputs don't need to be preserved; if they do, you will need a custom script like other answers have provided.

Something like the following with do:

$ while IFS= read -r line; do grep -m 1 "^$line" t1; done <t2
 1 1.2
 2 2.2
 1 1.2

If I understand you want to match the first column of t1 with the values in t2 . So t1 is a dictionnary and t2 the wanted keys.

If so, you can use this:

$ cat t2 | xargs -n1 -I{} grep -P "^\Q{}\E\s+" t1

How does it work?

xargs will execute the command grep for each one entry -n1 of t2 . The -I{} allows me to put the value where I want to.

Then I execute grep which match the wanted value from the dictionary using a regular expression.

^    # Any line that begin with
\Q   # Quote the value (in case we have special chars inside it)
{}   # The corresponding value matched by xargs
\E   # End of quoting
\s+  # Followed by one or more spaces (alternatively we can use `\b`)
.*   # Followed by anything (optional)

t1   # Inside the file `t1`

Alternatively you can play with Perl :)

cat t2 | perl -e '$_ = qx{cat $ARGV[0]}; \
      $t1{$1} = $2 while(/^(\w+)\s+(.*)/gm); \
      print "$t1{$_}\n" for (split "\n", do{local $/, <STDIN>})' t1

您可以尝试AWK

awk 'NR==FNR{a[$1]=$2}NR>FNR{print $1,a[$1]}' t1 t2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM