简体   繁体   中英

Could I perform a hash by Unix in which for every row in one column print all coincidences corresponding to other column within other tab file

In summary I have two TAB files as describe bellow.

Input1.tsv

34167   305603  S568    phosphorylation S       568
99024   179102  T170    Glycosylation   T       170                                                     
99025   179102  Y182    phosphorylation Y       182                                                     
74300   105800  S632    phosphorylation S       632                                                     
41095   105800  S748    phosphorylation S       748                                                     
41096   105800  S778    acethylation    S       778

and Input2.tsv

179102  FUCA1   NM_000147.4 NP_000138.2
179102  FUCA1   XM_005245821.2  XP_005245878.1
179102  FUCA1   XM_011541167.2  XP_011539469.1
179102  FUCA1   XM_017000905.1  XP_016856394.1
357819  AGT     NM_000029.3     NP_000020.1
105800  INPP5B  NM_001297434.1  NP_001284363.1
105800  INPP5B  NM_001297434.1  NP_001284363.1

Desired output

179102  FUCA1   NM_000147.4 NP_000138.2    Glycosylation   T       170   phosphorylation Y       182
179102  FUCA1   XM_005245821.2  XP_005245878.1    Glycosylation   T       170   phosphorylation Y       182
179102  FUCA1   XM_011541167.2  XP_011539469.1    Glycosylation   T       170   phosphorylation Y       182
179102  FUCA1   XM_017000905.1  XP_016856394.1    Glycosylation   T       170   phosphorylation Y       182
357819  AGT     NM_000029.3     NP_000020.1
105800  INPP5B  NM_001297434.1  NP_001284363.1    phosphorylation S       748   phosphorylation S       748    acethylation    S       778
105800  INPP5B  NM_001297434.1  NP_001284363.1    phosphorylation S       748   phosphorylation S       748    acethylation    S       778

I would like to do a hash in order to relate each coincidence in second column for every row of first file to first column of second file and print second file and all coincidences of first file for every row. I'm trying to do a hash like

awk 'BEGIN {FS=OFS="\t"} NR==FNR {h[$2]=$4"\t"$5"\t"$6; next} {print $0,h[$1]}' "input1" "input2" > "output";

The inconvenience is I think print only first coincidence found in first file but not all. Thus not all coincidences related to first file are registered in the output file. Are there a possibility to do in Unix environment in order to get the desired output? Thanks in advance

Modify your command as followed,

awk 'BEGIN {OFS="\t"} NR==FNR{h[$2]=h[$2] OFS $4 OFS $5 OFS $6;next} {print $0,h[$1]}' Input1.tsv Input2.tsv

This should get what you desired.

Modified part:

  • h[$2]=h[$2] OFS $4 OFS $5 OFS $6 : append the matched case behind h[$2]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM