[英]How to extract strings by using awk on multiple text files and summaries to one file
我有 70 個輸入文件,文件名像 slurm-22801576.out、slurm-22801573.out、slurm-26801571.out 等等。 我想將所有需要的字符串提取到一個文件中。 因此,我執行了以下操作,但只能對一個文件執行此操作。 如何在多個文件上做到這一點?
awk 'BEGIN{printf "file,reads,file,sample\n"}NR==32{printf "%s,%s,",FILENAME,$3}NR==2{printf "%s,%s,",FILENAME,$18}' slurm-22801576.out > summary/total_reads.csv
但是我的輸出文件只有一行
file,reads,file,sample
slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160,
在每個輸入文件中,文本如下所示:
job starting at 23:42:13
java -ea -Xmx57039m -Xms57039m -cp /sw/bioinfo/bbmap/38.61b/rackham/current/ align2.BBMap build=1 overwrite=true fastareadlen=500 pairedonly=t ambiguous=toss secondary=t killbadpairs=t perfectmode=t minid=1 mappedonly=t outm=2006_40_aligned.sam scafstats=2006_40_fulllength.scafstats in=/crex/proj/datasets/human_depleted/Ki-2006-40-226_unmapped_R1.fq.gz in2=/crex/proj/datasets/human_depleted/Ki-2006-40-226_unmapped_R2.fq.gz threads=auto
Executing align2.BBMap [build=1, overwrite=true, fastareadlen=500, pairedonly=t, ambiguous=toss, secondary=t, killbadpairs=t, perfectmode=t, minid=1, mappedonly=t, outm=2006_40_aligned.sam, scafstats=2006_40_fulllength.scafstats, in=/crex/proj/datasets/human_depleted/Ki-2006-40-226_unmapped_R1.fq.gz, in2=/crex/proj/datasets/human_depleted/Ki-2006-40-226_unmapped_R2.fq.gz, threads=auto]
Version 38.61
Set OUTPUT_MAPPED_ONLY to true
Scaffold statistics will be written to 2006_40_fulllength.scafstats
Set threads to 10
Ambiguously mapped reads will be considered unmapped.
Set MINIMUM_ALIGNMENT_SCORE_RATIO to 1.000
Set genome to 1
Loaded Reference: 2.275 seconds.
Loading index for chunk 1-1, build 1
Generated Index: 4.174 seconds.
Analyzed Index: 3.394 seconds.
Started output stream: 0.264 seconds.
Creating scaffold statistics table: 0.064 seconds.
Cleared Memory: 1.216 seconds.
Processing reads in paired-ended mode.
Started read stream.
Started 10 mapping threads.
Detecting finished threads: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
------------------ Results ------------------
Genome: 1
Key Length: 13
Max Indel: 0
Minimum Score Ratio: 1.0
Mapping Mode: perfect
Reads Used: 461344228 (57756075474 bases)
Mapping: 595.769 seconds.
Reads/sec: 774367.86
kBases/sec: 96943.77
預期的輸出文件應如下所示:
file,reads,file,sample
slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160,
slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160,
slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160,
slurm-22801576.out,outm=2006_42_aligned.sam,slurm-22801576.out,480789160,
你可以使用這個awk
:
awk '
BEGIN{print "file,reads,file,sample"}
FNR==2 {printf "%s,%s,", FILENAME, $18}
FNR==32 {printf "%s,%s,\n", FILENAME, $3}
' slurm-*.out > summary/total_reads.csv
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.