简体   繁体   中英

Compare the first column and then combine two text files in linux with bash script?

I got two kind of text file file1 and file2 as following

 ID22, abc0o, 1011, h232a, 78m, 928aaa
 ID2344, oklabc, 12as2, 987, 7f82, sas28aas
 ID092, ac, 12, haha, 782oee, gsd839

and the second one

 ID1, 1, 2, 3, 4, 5
 ID22, 6, 7
 ID097222, 8, 9, 10
 ID67, 11, 12, 13, 14, 1
 ID2344, 8, 17, 23, 7, 555
 ID2328999, 642, 43, 34, 34, 121
 ID2344, 2111, 12
 ID22, 1212, 9999, 23, 232, 96564
 ID092, 1010, 1111, 1213, 1415, 18718
 ID2328999, 9999, 333, 222, 7f82, 28
 ID22, 8888, 777, 4444
 ID2344, 220020, 666, 555, 782m, 839

well what I would like to make and save to other file is find the first column of file1 in file2 and add the rest of line in file 2 to file1 in the same line and preserve the order too. Of course the values of the first column in file1 are unique. The result should be as below.

ID22, abc0o, 1011, h232a, 78m, 928aaa, 6, 7, 1212, 9999, 23, 232, 96564, 8888, 777, 4444
ID2344, oklabc, 12as2, 987, 7f82, sas28aas, 8, 17, 23, 7, 555, 2111, 12, 220020, 666, 555, 782m, 839
ID092, ac, 12, haha, 782oee, gsd839, 1010, 1111, 1213, 1415, 18718

Could you please try following, written and tested with shown samples in GNU awk .

awk '
BEGIN{
  FS=OFS=", "
}
{
  first=$1
  $1=""
  sub(/^, +/,"")
}
FNR==NR{
  arr[first]=$0
  next
}
(first in arr){
  arr[first]=(arr[first]?arr[first] OFS:"")$0
}
END{
  for(key in arr){
    print key,arr[key]
  }
}
' file1 file2

Explanation: Adding detailed explanation for above solution.

awk '                          ##Starting awk program from here.
BEGIN{                         ##Starting BEGIN section of this program from here.  
  FS=OFS=", "                  ##Setting field separator and output field separator as comma space
}
{
  first=$1                     ##Creating first with value of 1st field here.
  $1=""                        ##Nullifying first field here.
  sub(/^, +/,"")                ##Substituting initial space with NULL here.
}
FNR==NR{                       ##Checking condition which will be TRUE when file1 is being read.
  arr[first]=$0                ##Creating arr with index of first and value of current line.
  next                         ##next will skip all further statements from here.
}
(first in arr){                ##Checking condition if first is present in arr then do following.
  arr[first]=(arr[first]?arr[first] OFS:"")$0  ##Keep adding current line value into arr[first] value.
}
END{                           ##Starting END block of this program from here.
  for(key in arr){             ##Traversing through arr here.
    print key,arr[key]         ##Printing index of arr and value of arr here.
  }
}
' file1 file2                  ##Mentioning Input_file names here.

Another awk, the straight forward way:

$ awk '
NR==FNR {
    k=$1                   # store $1 as key k
    $1=""                  # null $1
    a[k]=a[k] "," $0       # append records excluding the $1 to a[k]
    next
}
$1 in a {
    print $0 a[$1]         # output
}' file2 file1             # mind the order

Output:

 ID22, abc0o, 1011, h232a, 78m, 928aaa, 6, 7, 1212, 9999, 23, 232, 96564, 8888, 777, 4444
 ID2344, oklabc, 12as2, 987, 7f82, sas28aas, 8, 17, 23, 7, 555, 2111, 12, 220020, 666, 555, 782m, 839
 ID092, ac, 12, haha, 782oee, gsd839, 1010, 1111, 1213, 1415, 18718
awk -F, '
          NR==FNR { 
                    map[$1]=$0 
                  } 
          FNR!=NR { 
                    if (map[$1] != "" ) 
                    { 
                       map[$1]=map[$1]","$0
                    }  
                  } 
          END     { 
                    for (i in map) { 
                                     if (map[i]!="") 
                                                   { 
                                                     print map[i] 
                                                   } 
                     } 
                   }' file1 file2     

awk -F, 'NR==FNR { map[$1]=$0 } FNR!=NR { if (map[$1] != "" ) { map[$1]=map[$1]","$0}  } END { for (i in map) { if (map[i]!="") { print map[i] } } }' file1 file2

Process the first file with awk (MR==FNR). Read the line as a value for array map with the first comma separated field as the index. Then for the second file (FNR,=NR), where there is an entry in map for the first comma delimited field. append the line to the entry in map, At the end, loop through map and where the value is not empty. print.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM