[英]merge two files based on partial matching
我有两个文件
文件A.txt
ID
479432_Sros_4274
330214_NIDE2792
517722_CJLT1_010100003977
257310_BB0482
...
FileB.txt(**只是为了帮助您识别匹配项)
members category
6085.XP_002168109,**479432_Sros_4274**,4956.XP_002495993.1,457425.SSHG_03214,51511.ENSCSAVP000 P
7159.AAEL006372-PA,**257310_BB0482** J
**517722_CJLT1_010100003977**,701176.VIBRN418_17773,9785.ENSLAFP00000010769,28377.ENSACAP00000014901,4081.Solyc03g120250.2.1,3847.GLYMA18G02240.1 U
500485.XP_002561312.1,1042876.PPS_0730,222929.XP_003071446.1,**330214_NIDE2792** S
...
预期 output
Output.txt
ID category
479432_Sros_4274 P
330214_NIDE2792 S
517722_CJLT1_010100003977 U
257310_BB0482 J
...
我已经根据其他问题的答案尝试了 awk 和 R 中的一些代码,但我无法获得所需的 output。
这是一种方法:
$ awk '
NR==FNR { # process file1
if(FNR==1) # print header, no newline
printf $1
a[$1] # hash data
next
}
{ # process file2
if(FNR==1) # print the other half of the header
print OFS $2
for(i in a) # loop all items in hash
if($1 ~ i) # check for partial match
print i,$2 # if found, output
}' file1 file2 # mind the order
Output(按file2顺序,注意output最后一行的部分匹配,留作警告):
ID category
479432_Sros_4274 P
257310_BB0482 J
517722_CJLT1_010100003977 U
330214_NIDE2792 S
ID S
请您尝试以下操作。
awk '
BEGIN{
print "ID category"
}
FNR==NR{
a[$0]
next
}
{
for(i in a){
if(match($0,i)){
print i,$NF
}
}
}
' Input_filea Input_fileb
说明:为上述代码添加说明。
awk ' ##Starting awk program here.
BEGIN{ ##Starting BEGIN section from here.
print "ID category" ##Printing string ID, category here.
} ##Closing BLOCK for BEGIN section.
FNR==NR{ ##Checking condition FNR==NR which will be TRUE when 1st Input_file is being read.
a[$0] ##Creating an array named a whose index is $).
next ##next will skip all further statements from here.
}
{
for(i in a){ ##Traversing through array a with for loop.
if(match($0,i)){ ##Checking condition if match is having a proper regex matched then do following.
print i,$NF ##Printing variable i and $NF of current line.
}
}
}
' Input_filea Input_fileb ##Mentioning Input_file names here.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.