简体   繁体   中英

concatenate files awk/linux

I have n files in a folder which starts with lines as shown below.

##contig=<ID=chr38,length=23914537>
##contig=<ID=chrX,length=123869142>
##contig=<ID=chrMT,length=16727>
##samtoolsVersion=0.1.19-44428cd
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P922_120
chr1    412573  SNP74   A       C       2040.77 PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;DP=58;
chr1    602567  BICF2G630707977 A       G       877.77  PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;  
chr1    604894  BICF2G630707978 A       G       2044.77 PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1    693376  .       GCCCCC  GCCCC   761.73  .       AC=2;AC1=2;AF=1.00;AF1=1;

There are n such files. I want to concatenate all the files into a single file such that all the lines begining with # should be deleted from all the files and concatenate the rest of the rows from all the files only retaining the header line. Example output is shown below:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P922_120
chr1    412573  SNP74   A       C       2040.77 PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;DP=58;
chr1    602567  BICF2G630707977 A       G       877.77  PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;  
chr1    604894  BICF2G630707978 A       G       2044.77 PASS    AC=2;AC1=2;AF=1.00;AF1=1;AN=2;DB;
chr1    693376  .       GCCCCC  GCCCC   761.73  .       AC=2;AC1=2;AF=1.00;AF1=1;

Specifically with awk:

awk '$0!~/^#/{print $0}' file1 file2 file3 > outputfile

Broken down you are checking if the line ($0) does not match (!~) a string beginning with # (/^#/) and if so, print the line. You take input files and write to (>) outputfile.

Your problem is not terribly well specified, but I think you are just looking for:

sed '/^##/d' $FILE_LIST > output

Where FILE_LIST is the list of input files( you may be able to use * )

Or you can use grep like this:

grep -vh "^##" *

The -v means inverted , so the command means... look for all lines NOT starting ## in all files and don't print filenames ( -h ).

Or, if you want to emit 1 header line at the start,

(grep -m1 ^#CHROM * ; grep -hv ^## * ) > out.txt

If I understood correctly, you could do:

echo "#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  P922_120" > mergedfile
for file in $FILES; do cat $file | grep -v "#" >> mergedfile; done

Note that $FILES could be ls and the -v option in grep is the non-match flag.

我相信你想要的是

awk '$0 ~/^##/ { next; } $0 ~ /^#/ && !printed_header {print; printed_header=1 } $0! ~ /^#/ {print }' file1 file2 file3 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM