简体   繁体   中英

Join two files linux

I am trying to join two files but they don't have the same number of lines. I need to join them by the second column.

File1:

 11#San Noor#New York, US
 22#Maria Shiry#Dubai, UA
 55#John Smith#London, England
 66#Viki Sam#Roman, Italy
 81#Sara Moheeb#Montreal, Canada

File2:

 C1#Steve White#11
 C2#Hight Look#21
 E1#The Heaven is more#52
 I1#The Roma Seen#55

The output should be:

The output for paired lines should look like:

 San Noor#Sereve White  

The output for unpairable lines should look like:

 Sara Moheeb#NA

(The file3 after joining should contain 5 lines and look as followed.)

  San Noor#Steve White
  Maria Shiry#Hight Look
  John Smith#The Heaven is more
  Viki Sam#The Roma Seen
  Sara Moheeb#NA
  

I have tried to join these two files using this command:

join -t '#' -j2 -e "NA" <(sort -t '#' -k2 File1) <(sort -t '#' -k2 File2) > File3

It says that both files are not sorted. Also, I need a way to fill in missing values after join.

Extract relevant columns and paste them together.

paste -d '#' <(cut -d '#' -f2 file1) <(cut -d '#' -f2 file2)

Well, but this will fail for the NA case, when one file has less lines then the other. You could pipe it to something along awk -v OFS='#' -F'#' { for (i=1;i<NF;++i) if (length($i) == 0) $i="NA"; } awk -v OFS='#' -F'#' { for (i=1;i<NF;++i) if (length($i) == 0) $i="NA"; } to substitute empty fields for the string NA .

So I guess your method is a possible one, but you have nothing to "join" on the files. So join on an a imaginary column with line numbers:

join -t'#' -eNA -a1 -a2 -o1.2,2.2 <(cut -d'#' -f2 file1 | nl -w1 -s'#') <(cut -d'#' -f2 file2 | nl -w1 -s'#')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM