[英]How to extract specific information from multiple files and make a table in linux?
I have multiple text files with information. 我有多个包含信息的文本文件。 Here I'm showing for two text files which are like below:
在这里,我显示两个文本文件,如下所示:
Sample1.txt Sample1.txt
Status /documents/Sample1.sorted.bam
Assigned 50945040
Unassigned_Unmapped 947866
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 49013681
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_NoFeatures 21189312
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 4430011
Sample2.txt Sample2.txt
Status /documents/Sample2.sorted.bam
Assigned 36335614
Unassigned_Unmapped 870456
Unassigned_MappingQuality 0
Unassigned_Chimera 0
Unassigned_FragmentLength 0
Unassigned_Duplicate 0
Unassigned_MultiMapping 68688141
Unassigned_Secondary 0
Unassigned_Nonjunction 0
Unassigned_NoFeatures 23746485
Unassigned_Overlapping_Length 0
Unassigned_Ambiguity 3734593
For single text file I'm using grep: 对于单个文本文件,我正在使用grep:
grep "Assigned\|Unmapped\|MultiMapping\|NoFeatures\|Ambiguity" Sample1.txt > output.txt
But I want the output to be like below were I can use a small script on all text file and make table: 但是我希望输出如下所示,因为我可以在所有文本文件上使用小脚本并制作表格:
Sample1 Sample2
Assigned 50945040 36335614
Unassigned_Unmapped 947866 870456
Unassigned_MultiMapping 49013681 68688141
Unassigned_NoFeatures 21189312 23746485
Unassigned_Ambiguity 4430011 3734593
$ cat tst.awk
$2 != 0 {
printf "%s%s", (NR>1 ? $1 : "Name"), OFS
for (i=2; i<=NF; i+=2) {
gsub(/^.*\/|\..*$/,"",$i)
printf "%s%s", $i, (i<NF ? OFS : ORS)
}
}
$ paste Sample1.txt Sample2.txt | awk -f tst.awk | column -t
Name Sample1 Sample2
Assigned 50945040 36335614
Unassigned_Unmapped 947866 870456
Unassigned_MultiMapping 49013681 68688141
Unassigned_NoFeatures 21189312 23746485
Unassigned_Ambiguity 4430011 3734593
To get output that Excel can understand rather than the output shown in the question do this: 要获得Excel可以理解的输出而不是问题中显示的输出,请执行以下操作:
$ cat tst.awk
BEGIN { OFS="," }
$2 != 0 {
printf "%s%s", (NR>1 ? $1 : "Name"), OFS
for (i=2; i<=NF; i+=2) {
gsub(/^.*\/|\..*$/,"",$i)
printf "%s%s", $i, (i<NF ? OFS : ORS)
}
}
$ paste Sample1.txt Sample2.txt | awk -f tst.awk > output.csv
and then double-click on output.csv to open it with Excel. 然后双击output.csv以使用Excel打开它。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.