简体   繁体   English

如何在不影响标题的情况下使用此awk命令

[英]How to use this awk command without affecting the header

Good nigt. 好黑 I have this two files: 我有这两个文件:

File 1 - with phenotype informations, the first column are the Ids, the orinal file has 400 rows: 文件1-具有表型信息,第一列是ID,原始文件有400行:

ID  a b  c          d 
215 2 25 13.8354303 15.2841303
222 2 25.2 15.8507278 17.2994278
216 2 28.2 13.0482192 14.4969192
223 11 15.4 9.2714745 11.6494745

File 2 - with SNPs information, the original file has 400 lines and 42,000 characters per line. 文件2-具有SNP信息,原始文件有400行,每行42,000个字符。

ID  t u j l
215 2 0 2 1 
222 2 0 1 1 
216 2 0 2 1 
223 2 0 2 2 
217 2 0 2 1 
218 0 2 0 2 

And I need to remove from file 2 individuals that do not appear in the file 1, for example: 我需要从文件2中删除未出现在文件1中的个人,例如:

ID  t u j l
215 2 0 2 1 
222 2 0 1 1 
216 2 0 2 1 
223 2 0 2 2

I used this code: 我使用以下代码:

awk 'NR==FNR{a[$1]; next}$1 in a{print $0}' file2 file1 > file3

and I can get this output(file 3): 我可以得到以下输出(文件3):

215 2 0 2 1 
222 2 0 1 1 
216 2 0 2 1 
223 2 0 2 2

but I lose the header, how do I not lose the header? 但是我丢失了标题,如何不丢失标题?

To keep the header of the second file, add a condition{action} like this: 要保留第二个文件的标题,请添加condition{action}如下所示:

awk 'NR==FNR {a[$1]; next}
     FNR==1  {print $0; next}  # <= this will print the header of file2.
     $1 in a {print $0}' file1 file2

NR holds the total record number while FNR is the file record number, it counts the records of the file currently being processed. NR保留总记录号,而FNR是文件记录号,它计算当前正在处理的文件的记录。 Also the next statements are important, so that to continue with the next record and don't try the rest of the actions. 另外, next一条语句也很重要,因此要继续下一条记录,不要尝试其余操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM