简体   繁体   English

如何将 awk 用于压缩文件

[英]How to use awk for a compressed file

How can I change the following command for a compressed file?如何更改压缩文件的以下命令?

awk 'FNR==NR { array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }' input1.vcf input2.vcf

The command working fine with normal file.该命令对普通文件工作正常。 I need to change the command for compressed files.我需要更改压缩文件的命令。

You need to read them compressed files like this:您需要像这样读取它们的压缩文件:

awk '{ ... }' <(gzip -dc input1.vcf.gz) <(gzip -dc input2.vcf.gz)

Try this:试试这个:

awk 'FNR==NR { sub(/AA=\.;/,""); array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }' <(gzip -dc input1.vcf.gz) <(gzip -dc input2.vcf.gz) | gzip > output.vcf.gz
zcat FILE | awk '{ ...}'

我无法分辨出所有这些方法中哪一种效果最好,zcat 至少输入速度更快;)

bzip2 -dc input1.vcf.bz2 input2.vcf.bz2 | awk 'FNR==NR { array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }'

or要么

gzip -dc input1.vcf.gz input2.vcf.gz | awk 'FNR==NR { array[$1,$2]=$8; next } ($1,$2) in array { print $0 ";" array[$1,$2] }'

EDIT:编辑:

To write compressed output just append要写入压缩输出只需追加

| bzip2 >output.vcf.bz2

or要么

| gzip >output.vcf.gz

This will work with any program that prints results to standard output.这适用于任何将结果打印到标准输出的程序。

BTW: Editing such large command lines gets tedious very quickly.顺便说一句:编辑如此大的命令行很快就会变得乏味。 You should consider writing a small shell script to do the job.您应该考虑编写一个小的 shell 脚本来完成这项工作。 This has the additional benefit that you don't have to remember the entire thing and can easily repeat the command or modify it if necessary.这有一个额外的好处,您不必记住整个事情,并且可以轻松地重复命令或在必要时修改它。

A good starting point for Linux shell programming is the Bash Programming Inroduction by Mike G. Linux shell 编程的一个很好的起点是 Mike G 的Bash 编程介绍

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM