[英]bash/sed: producing 2D bar plots from multi-column data
I am dealing with the analysis of multi-column data organized in the following manner:我正在处理按以下方式组织的多列数据的分析:
#Acceptor DonorH Donor Frames Frac AvgDist AvgAng
lig_608@O1 GLU_166@H GLU_166@N 13731 0.6865 2.8609 160.4598
lig_608@O2 HIE_163@HE2 HIE_163@NE2 8320 0.4160 2.8412 150.3665
lig_608@N2 ASN_142@HD21 ASN_142@ND2 1575 0.0788 2.9141 157.3493
lig_608@N2 THR_25@HG1 THR_25@OG1 218 0.0109 2.8567 156.0376
lig_608@O1 GLN_189@HE22 GLN_189@NE2 72 0.0036 2.8427 157.3778
lig_608@N2 ASN_142@HD22 ASN_142@ND2 43 0.0022 2.9093 165.3063
lig_608@N2 SER_46@HG SER_46@OG 32 0.0016 2.8710 159.8673
lig_608@F1 HIE_41@HE2 HIE_41@NE2 31 0.0015 2.8904 153.0763
lig_608@O2 SER_144@HG SER_144@OG 20 0.0010 2.8147 144.6951
lig_608@N2 THR_24@HG1 THR_24@OG1 16 0.0008 2.8590 165.3937
lig_608@O2 GLY_143@H GLY_143@N 15 0.0008 2.8729 149.1930
lig_608@F1 GLN_189@HE22 GLN_189@NE2 15 0.0008 2.9192 146.2273
lig_608@O2 SER_144@H SER_144@N 10 0.0005 2.9259 148.8008
lig_608@N2 THR_26@H THR_26@N 8 0.0004 2.9491 149.1861
lig_608@O2 GLU_166@H GLU_166@N 4 0.0002 2.8839 150.1238
lig_608@N2 GLN_189@HE21 GLN_189@NE2 3 0.0001 2.9567 153.7993
lig_608@N2 ASN_119@HD21 ASN_119@ND2 2 0.0001 2.8564 147.7916
lig_608@O2 CYS_145@H CYS_145@N 2 0.0001 2.8867 151.6423
lig_608@O1 GLN_189@HE21 GLN_189@NE2 2 0.0001 2.8888 148.3678
lig_608@N2 GLY_143@H GLY_143@N 2 0.0001 2.9658 149.2518
lig_608@F1 GLN_189@HE21 GLN_189@NE2 1 0.0001 2.8675 139.9754
lig_608@F1 GLN_189@H GLN_189@N 1 0.0001 2.8987 168.1758
lig_608@N2 HIE_41@HE2 HIE_41@NE2 1 0.0001 2.9411 147.0443
From this I need to take into account the info from the third column (donor) as well as the fifth column (Frac) and print the 2D histogram of the data taking into account the values (of the fifth column) bigger then 0.01.由此我需要考虑第三列(供体)和第五列(Frac)的信息,并打印数据的二维直方图,同时考虑大于 0.01 的值(第五列)。 So in the demonstrated example, only the following data should be considered:因此在演示示例中,只应考虑以下数据:
#Donor #Frac
GLU_166@N 0.6865
HIE_163@NE2 0.4160
ASN_142@ND2 0.0788
THR_25@OG1 0.0109
and the 2D histogram should plot # Donor on X and #Frac on Y (in %)二维直方图应为 plot X 上的 # Donor 和 Y 上的 #Frac(单位为 %)
Before I had to add the following lines to the reduced 2D datafile in order that it could be recognized by gracebat as 2D bar plot:在我必须将以下行添加到简化的 2D 数据文件之前,以便 gracebat 可以将其识别为 2D 条 plot:
@ title "No tittle"
@ xaxis label "Donor"
@ yaxis label "Frac"
@s0 line type 0
@TYPE bar
# here is the data in 2 column format
Is it possible to automatize such file post-processing to produce the bar plot on-the-fly?是否可以自动执行此类文件后处理以即时生成条形图 plot? alternatively I would be grateful for sed solution to edit the datafile on the fly to reduce it to 2 columns and insert in the begining @ lines required for bar graph ploting using:或者我会很感激 sed 解决方案来即时编辑数据文件以将其减少到 2 列并插入条形图绘图所需的开头 @ 行使用:
sed -i 's/old-text/new-text/g' datafile
sed
isn't meant for this kind of task, you should use awk
: sed
不适用于此类任务,您应该使用awk
:
awk '
BEGIN {
print "@ title \"No title\""
print "@ xaxis label \"Donor\""
print "@ yaxis label \"Frac\""
print "@s0 line type 0"
print "@TYPE bar"
}
NR > 1 && $5 > 0.01 { print $3, $5 }
' file.txt
You could also do this with an on-the-fly generated Gnuplot script, eg:您也可以使用动态生成的 Gnuplot 脚本来执行此操作,例如:
cat <<EOS | gnuplot > output.png
set term pngcairo size 1280,960
set xtics noenhanced
set xlabel "Frac"
set ylabel "Donor"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.01' infile.tsv" using 0:5:xtic(3) with boxes
EOS
Which results in a png file:结果是一个 png 文件:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.