简体   繁体   中英

Merge multiple files to a single file including unmatched lines in shell

File1.log

207.46.13.90  37556
157.55.39.51  34268
40.77.167.109 21824
157.55.39.253 19683

File2.log

207.46.13.90  62343
157.55.39.51  58451
157.55.39.200 37675
40.77.167.109 21824

File3.log

207.46.13.90  85343
157.55.39.51  59876
157.55.39.200 37675
157.55.39.253 19683

Below should be expected history.log

207.46.13.90    37556   62343   85343
157.55.39.51    34268   58451   59876
157.55.39.200   -----   37675   37675
40.77.167.109   21824   21824   -----
157.55.39.253   19683   -----   19683

Using Join doesn't work - And I could attain with 2 files as suggested by Ravinder in the other thread: Join two files including unmatched lines in Shell

Also, in the next run I would be adding another file4.log to the say history.log as a 4th column. Thanks in advance.

You may use this gnu awk to combine multiple files with same key as first column value:

awk -v OFS='\t' '{
   a[$1][ARGIND] = $2
}
END {
   for (i in a) {
      printf "%s", i
      for (j=1; j<ARGC; j++)
         printf "%s", OFS (j in a[i] ? a[i][j] : "-----")
      print ""
   }
}' File*.log

207.46.13.90    37556   62343   85343
40.77.167.109   21824   21824   -----
157.55.39.51    34268   58451   59876
157.55.39.253   19683   -----   19683
157.55.39.200   -----   37675   37675

Could you please try following. This should work for more than 3 Input_files too(though tested with OP's sample 3 Input_file(s) only but should work).

awk '
FNR==1{
  count++
}
{
  a[$1]
  b[count,$1]=$2
}
END{
  for(j in a){
    for(i=1;i<=count;i++){
      printf("%s%s%s",i==1?j OFS:"",b[i,j]?b[i,j]:" ----- ",i==count?ORS:OFS)
    }
  }
}
'  Input_file1  Input_file2  Input_file3 | column -t

Output will be as follows.

207.46.13.90   37556  62343  85343
40.77.167.109  21824  21824  -----
157.55.39.51   34268  58451  59876
157.55.39.253  19683  -----  19683
157.55.39.200  -----  37675  37675

Explanation: Adding a detailed explanation for above code here.

awk '                                                                              ##Starting awk program from here.
FNR==1{                                                                            ##Checking condition if this is first line then do following.
  count++                                                                          ##Creating a variable count whose value is increasing each time FNR==1 for each Input_file first line.
}
{
  a[$1]                                                                            ##Creating an array named a whose index is $1 first field of current line for Input_file(s).
  b[count,$1]=$2                                                                   ##Creating an array named b whose index is count,$1 and value is $2 of current line.
}
END{                                                                               ##Starting END BLOCK for this awk program here.
  for(j in a){                                                                     ##Looping through array a all elements from here.
    for(i=1;i<=count;i++){                                                         ##Running a for loop from i=1 till value of count.
      printf("%s%s%s",i==1?j OFS:"",b[i,j]?b[i,j]:" ----- ",i==count?ORS:OFS)      ##Using printf statement where first condition its checking is i==1 TRUE then print j OFS OR print NULL, checking condition if element b[i,j] is NOT NULL then print its value else print NULL. Final condition is if i==count then print new line else print space.
    }                                                                              ##Closing BLOCK for, for loop of i=1 to i<=count.
  }                                                                                ##Closing BLLOCK for, for loop (j in a) here.
}                                                                                  ##Closing BLOCK for END BLOCK of this awk program.
'  Input_file1  Input_file2  Input_file3 | column -t                               ##Mentioning Input_file names here and sending its output to column -t command to get correct spaces in output.

Using join , it's a 2-step process: note that join requires its input to be lexically sorted

  1. first you need to join the first 2 files:

     join -a 1 -a 2 -e "-----" -o "0,1.2,2.2" <(sort File1.log) <(sort File2.log)
     157.55.39.200 ----- 37675 157.55.39.253 19683 ----- 157.55.39.51 34268 58451 207.46.13.90 37556 62343 40.77.167.109 21824 21824
  2. then join that output with file 3:

     join -a 1 -a 2 -e "-----" -o "0,1.2,1.3,2.2" \\ <( join -a 1 -a 2 -e "-----" -o "0,1.2,2.2" <(sort File1.log) <(sort File2.log) ) \\ <(sort File3.log)
     157.55.39.200 ----- 37675 37675 157.55.39.253 19683 ----- 19683 157.55.39.51 34268 58451 59876 207.46.13.90 37556 62343 85343 40.77.167.109 21824 21824 -----
  3. if you want, tidy the output with column :

     join -a 1 -a 2 -e "-----" -o "0,1.2,1.3,2.2" \\ <( join -a 1 -a 2 -e "-----" -o "0,1.2,2.2" <(sort File1.log) <(sort File2.log) ) \\ <(sort File3.log) \\ | column -t
     157.55.39.200 ----- 37675 37675 157.55.39.253 19683 ----- 19683 157.55.39.51 34268 58451 59876 207.46.13.90 37556 62343 85343 40.77.167.109 21824 21824 -----

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM