简体   繁体   中英

Bash - joining (merging) files by columns

There are three files with columns, delimiter is "|", columns in lines might be empty. I need to join second and third files to first using columns 3 and 4 in first file, and column 1 in second and third.

For example:

file1.txt:

123456||4|11|17|A||10|B|1
123457||4|11|17|A||12||1
123458||5|11|17|A||1|Б|1
123459||6|13|17|A||1|Б|1

file2.txt:

4|Forth
5|Fifth
6|Sixth

file3.txt:

11|st.|Eleventh
13|pr.|Thirteenth

What output I desire:

123456||4|Forth|11|st.|Eleventh|17|A||10|B|1
123457||4|Forth|11|st.|Eleventh|17|A||12||1
123458||5|Fifth|11|st.|Eleventh|17|A||1|Б|1
123459||6|Sixth|13|pr.|Thirteenth|17|A||1|Б|1

How to write Bash script that will do what I need? I understand that it's awt command, but I couldn't write the script. Thanks for answers.

You can use this awk command:

awk 'BEGIN{ FS=OFS="|" }
     NR == FNR {a[$1]=$0; next}
     NR == FNR + length(a) {b[$1]=$0; next}
     {$3=b[$3]; $4=a[$4]} 1' file3.txt file2.txt file1.txt
123456||4|Forth|11|st.|Eleventh|17|A||10|B|1
123457||4|Forth|11|st.|Eleventh|17|A||12||1
123458||5|Fifth|11|st.|Eleventh|17|A||1|Б|1
123459||6|Sixth|13|pr.|Thirteenth|17|A||1|Б|1

Explanation:

  • BEGIN{ FS=OFS="|" } BEGIN{ FS=OFS="|" } - Set input and output field separators as pipe |
  • NR == FNR - Execute this block for first file only
  • a[$1]=$0; next a[$1]=$0; next - Create an array a with key as $1 and value as complete line
  • NR == FNR + length(a) - Execute this block for second file only
  • b[$1]=$0; next b[$1]=$0; next - Create an array b with key as $1 and value as complete line
  • Execute next block {...} for the last (3rd) file
  • $3=b[$3]; - Assign value of b[$3] to 3rd field
  • $4=a[$4]; - Assign value of a[$4] to 4th field
  • 1 - Is the default action to print each record

Assuming the files are sorted:

join -t'|' -1 4 -2 1 \
     <(join -t '|' -1 3 -2 1 file1.txt file2.txt) file3.txt

If you really need the fields in a specific order add a output format option:

 -o1.2,1.3,1.1,1.11,1.4,2.2,2.3,1.5,1.6,1.7,1.8,1.9,1.10,1.11

Here it is:

#!/bin/bash

while IFS='|' read c1 c2 c3 c4 c5 c6 c7 c8 c9 c10
        do
        st1=$( fgrep "$c3" file2.txt )
        st2=$( fgrep "$c4" file3.txt )
        echo "$c1|$c2|$st1|$st2|$c5|$c6|$c7|$c8|$c9|$c10"
        done

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM