模式匹配后插入一行

Question

I have a file as follows: 我有一个文件，如下所示：

Scaffold2   GeneWise        mRNA    3038    6649 
Scaffold2   GeneWise        CDS     3038    3480
Scaffold2   GeneWise        CDS     4175    4291
Scaffold3   GeneWise        mRNA    2824    15173
Scaffold3   GeneWise        CDS     2824    3302
Scaffold3   GeneWise        CDS     4143    4344

I want to have this output: 我想要这个输出：

Scaffold2   GeneWise        mRNA    3038    6649 
Scaffold2   GeneWise        CDS     3038    **3480**
Scaffold2   GeneWise        1st_intron     **3480    4175**
Scaffold2   GeneWise        CDS     **4175**    4291
Scaffold3   GeneWise        mRNA    2824    15173
Scaffold3   GeneWise        CDS     2824    **3302**
Scaffold3   GeneWise        1st_intron     **3302    4143**
Scaffold3   GeneWise        CDS     **4143**    4344

It should go as follows: If column 3 is 'mRNA', take the 5th column of the next line and the 4th column of the line after and insert a new line between the two that contains the 4th and 5th columns (as bold numbers indicate) with the third column called '1st_intron'. 它应如下所示：如果第3列是'mRNA'，则取下一行的第5列和其后的第4列，然后在包含第4列和第5列的两者之间插入新行（如粗体数字所示）），第三列称为“ 1st_intron”。

I have never dealt with such a problem, if you could give me some hint, that would be great. 我从来没有处理过这样的问题，如果您能给我一些提示，那就太好了。

Answer 1

You can use this simple awk: 您可以使用以下简单的awk：

awk '$3=="mRNA"{p=1; print; next}
     p{s=$1 FS $2 FS "1st_intron" FS $5; print; p=0; next}
     s{print s, $4; s=""} 1' file | column -t

Output: 输出：

Scaffold2  GeneWise  mRNA        3038  6649
Scaffold2  GeneWise  CDS         3038  3480
Scaffold2  GeneWise  1st_intron  3480  4175
Scaffold2  GeneWise  CDS         4175  4291
Scaffold3  GeneWise  mRNA        2824  15173
Scaffold3  GeneWise  CDS         2824  3302
Scaffold3  GeneWise  1st_intron  3302  4143
Scaffold3  GeneWise  CDS         4143  4344

column -t is only used to format the output. column -t仅用于格式化输出。

Answer 2

$ cat tst.awk
p1 == "mRNA" { x=$5 }
p2 == "mRNA" { print $1, $2, "1st_intron", x, $4 }
{ print; p2=p1; p1=$3 }

$ awk -f tst.awk file | column -t
Scaffold2  GeneWise  mRNA        3038  6649
Scaffold2  GeneWise  CDS         3038  3480
Scaffold2  GeneWise  1st_intron  3480  4175
Scaffold2  GeneWise  CDS         4175  4291
Scaffold3  GeneWise  mRNA        2824  15173
Scaffold3  GeneWise  CDS         2824  3302
Scaffold3  GeneWise  1st_intron  3302  4143
Scaffold3  GeneWise  CDS         4143  4344

Answer 3

Perl solution. Perl解决方案。

$intron is 0 if you don't want to do anything. 如果您不想执行任何操作，则$intron为0。 It's set to 1 when you process an mRNA line, so $left can remember the first number on the next line and set $intron to 2, which prints intron line and resets $intron . 处理mRNA行时将其设置为1，因此$left可以记住下一行的第一个数字，并将$intron设置$intron 2，这将打印内含子行并重置$intron 。

#!/usr/bin/perl
use warnings;
use strict;

my $intron = 0;
my ($left, $right);
while (<>) {
    my @items = split;

    if (1 == $intron) {
        $left = $items[4];
        $intron = 2;

    } elsif (2 == $intron) {
        print join "\t", @items[0, 1], '1st_intron', $left, $items[3];
        print "\n";
        $intron = 0;
    }

    $intron = 1 if 'mRNA' eq $items[2];
    print;
}

Answer 4

awk has a nice look-ahead function "getline": awk有一个很好的预读功能“ getline”：

awk '$3=="mRNA"{print;getline;c5=$5;print;getline;print $1," ",$2,"       1st_intron",c5,$4;print}'

Tested: 经过测试：

Scaffold2   GeneWise        mRNA    3038    6649
Scaffold2   GeneWise        CDS     3038    3480
Scaffold2   GeneWise        1st_intron 3480 4175
Scaffold2   GeneWise        CDS     4175    4291
Scaffold3   GeneWise        mRNA    2824    15173
Scaffold3   GeneWise        CDS     2824    3302
Scaffold3   GeneWise        1st_intron 3302 4143
Scaffold3   GeneWise        CDS     4143    4344

模式匹配后插入一行

问题描述

4 个解决方案

解决方案1
2 已采纳 2015-09-27 16:10:03

解决方案2
1 2015-09-27 16:43:14

解决方案3
0 2015-09-27 15:40:21

解决方案4
0 2015-09-27 15:47:05

模式匹配后插入一行

问题描述

4 个解决方案

解决方案1 2 已采纳 2015-09-27 16:10:03

解决方案2 1 2015-09-27 16:43:14

解决方案3 0 2015-09-27 15:40:21

解决方案4 0 2015-09-27 15:47:05

解决方案1
2 已采纳 2015-09-27 16:10:03

解决方案2
1 2015-09-27 16:43:14

解决方案3
0 2015-09-27 15:40:21

解决方案4
0 2015-09-27 15:47:05