简体   繁体   English

bash/sed:从多列数据生成二维条形图

[英]bash/sed: producing 2D bar plots from multi-column data

I am dealing with the analysis of multi-column data organized in the following manner:我正在处理按以下方式组织的多列数据的分析:

#Acceptor                DonorH           Donor   Frames         Frac      AvgDist       AvgAng
lig_608@O1            GLU_166@H       GLU_166@N    13731       0.6865       2.8609     160.4598
lig_608@O2          HIE_163@HE2     HIE_163@NE2     8320       0.4160       2.8412     150.3665
lig_608@N2         ASN_142@HD21     ASN_142@ND2     1575       0.0788       2.9141     157.3493
lig_608@N2           THR_25@HG1      THR_25@OG1      218       0.0109       2.8567     156.0376
lig_608@O1         GLN_189@HE22     GLN_189@NE2       72       0.0036       2.8427     157.3778
lig_608@N2         ASN_142@HD22     ASN_142@ND2       43       0.0022       2.9093     165.3063
lig_608@N2            SER_46@HG       SER_46@OG       32       0.0016       2.8710     159.8673
lig_608@F1           HIE_41@HE2      HIE_41@NE2       31       0.0015       2.8904     153.0763
lig_608@O2           SER_144@HG      SER_144@OG       20       0.0010       2.8147     144.6951
lig_608@N2           THR_24@HG1      THR_24@OG1       16       0.0008       2.8590     165.3937
lig_608@O2            GLY_143@H       GLY_143@N       15       0.0008       2.8729     149.1930
lig_608@F1         GLN_189@HE22     GLN_189@NE2       15       0.0008       2.9192     146.2273
lig_608@O2            SER_144@H       SER_144@N       10       0.0005       2.9259     148.8008
lig_608@N2             THR_26@H        THR_26@N        8       0.0004       2.9491     149.1861
lig_608@O2            GLU_166@H       GLU_166@N        4       0.0002       2.8839     150.1238
lig_608@N2         GLN_189@HE21     GLN_189@NE2        3       0.0001       2.9567     153.7993
lig_608@N2         ASN_119@HD21     ASN_119@ND2        2       0.0001       2.8564     147.7916
lig_608@O2            CYS_145@H       CYS_145@N        2       0.0001       2.8867     151.6423
lig_608@O1         GLN_189@HE21     GLN_189@NE2        2       0.0001       2.8888     148.3678
lig_608@N2            GLY_143@H       GLY_143@N        2       0.0001       2.9658     149.2518
lig_608@F1         GLN_189@HE21     GLN_189@NE2        1       0.0001       2.8675     139.9754
lig_608@F1            GLN_189@H       GLN_189@N        1       0.0001       2.8987     168.1758
lig_608@N2           HIE_41@HE2      HIE_41@NE2        1       0.0001       2.9411     147.0443

From this I need to take into account the info from the third column (donor) as well as the fifth column (Frac) and print the 2D histogram of the data taking into account the values (of the fifth column) bigger then 0.01.由此我需要考虑第三列(供体)和第五列(Frac)的信息,并打印数据的二维直方图,同时考虑大于 0.01 的值(第五列)。 So in the demonstrated example, only the following data should be considered:因此在演示示例中,只应考虑以下数据:

#Donor                #Frac
GLU_166@N              0.6865 
HIE_163@NE2            0.4160
ASN_142@ND2            0.0788
THR_25@OG1             0.0109

and the 2D histogram should plot # Donor on X and #Frac on Y (in %)二维直方图应为 plot X 上的 # Donor 和 Y 上的 #Frac(单位为 %)

Before I had to add the following lines to the reduced 2D datafile in order that it could be recognized by gracebat as 2D bar plot:在我必须将以下行添加到简化的 2D 数据文件之前,以便 gracebat 可以将其识别为 2D 条 plot:

@    title  "No tittle"
@    xaxis  label "Donor"
@    yaxis  label "Frac"
@s0 line type 0
@TYPE bar
# here is the data in 2 column format

Is it possible to automatize such file post-processing to produce the bar plot on-the-fly?是否可以自动执行此类文件后处理以即时生成条形图 plot? alternatively I would be grateful for sed solution to edit the datafile on the fly to reduce it to 2 columns and insert in the begining @ lines required for bar graph ploting using:或者我会很感激 sed 解决方案来即时编辑数据文件以将其减少到 2 列并插入条形图绘图所需的开头 @ 行使用:

sed -i 's/old-text/new-text/g' datafile

sed isn't meant for this kind of task, you should use awk : sed不适用于此类任务,您应该使用awk

awk  '
    BEGIN {
        print "@ title \"No title\""
        print "@ xaxis label \"Donor\""
        print "@ yaxis label \"Frac\""
        print "@s0 line type 0"
        print "@TYPE bar"
    }
    NR > 1 && $5 > 0.01 { print $3, $5 }
' file.txt

You could also do this with an on-the-fly generated Gnuplot script, eg:您也可以使用动态生成的 Gnuplot 脚本来执行此操作,例如:

cat <<EOS | gnuplot > output.png
set term pngcairo size 1280,960
set xtics noenhanced
set xlabel "Frac"
set ylabel "Donor"
set key off
set style fill solid 0.5
set boxwidth 0.9
plot "<awk 'NR == 1 || \$5 > 0.01' infile.tsv" using 0:5:xtic(3) with boxes
EOS

Which results in a png file:结果是一个 png 文件:请求表的条形图

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM