简体   繁体   English

通过保留 bash 中的第一行来删除与特定模式匹配的所有行

[英]Deleting all lines matching a specific pattern by retaining the first line in bash

I want to edit a gtf file by deleting all the lines except the first line matching pattern 'FAT1' and modify the coordinates ( 3rd & 4th columns).我想通过删除除第一行匹配模式“FAT1”之外的所有行并修改坐标(第 3 列和第 4 列)来编辑 gtf 文件。

#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1       havana  exon    137682  137965 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.16"; gene_source "havana";
1       havana  gene    139790  140339  gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.14"; gene_source "havana"; 
1       havana  exon    140001  140101 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";
1       havana  gene    143401  145401  gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana"; 

expected output预计 output

#!genome-build GRCh38.p7
#!genome-version GRCh38
#!genome-date 2013-12
#!genome-build-accession NCBI:GCA_000001405.22
#!genebuild-last-updated 2016-06
1       havana  exon    137682  137965 gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.16"; gene_source "havana";
1       havana  gene    139790  140339  gene_id "ENSG00000239906"; gene_version "1"; gene_name "RP11-34P13.14"; gene_source "havana"; 
1       havana  exon    147653  148000 gene_id "ENSG00000269981"; gene_version "1"; gene_name "FAT1"; gene_source "havana";

I tried some thing like this.我试过这样的事情。

    # Keep only the unique entry for FAT1 gene. 
    awk '/"ENSG00000269981"/&&c++ {next} 1' ref.gtf > ref_edit.gtf 

   #then manually edit the coordinates in vim editor

But i'm sure there will be more reasonable solution.但我相信会有更合理的解决方案。

Could you please try following.请您尝试以下操作。

awk -v new_fourth_col="147653" -v new_fifth_col="148000" '
BEGIN{
  OFS="\t"
}
/gene_name "FAT1"/{
  if(++count==1){
    $4=new_fourth_col
    $5=new_fifth_col
    print
  }
  next
}
{
  $1=$1
  print
}
' Input_file

Also I have made your output as tab delimited.此外,我已将您的 output 设为制表符分隔。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在其他所有行的第一行中添加具有特定模式的行? - How to add lines with specific pattern in a file at first of all the other lines? 使用Bash删除具有特定模式的行 - Remove lines with specific pattern with Bash 删除第一行中的匹配字符和其他行中的所有字符 - Remove matching character in the first line and all characters below in other lines 从文件的开头和结尾的匹配模式中读取行,并将其附加到bash中的另一个文本文件中 - Read lines from a file enclosed matching pattern of starting line and ending line and append it to another text file in bash 根据bash中的匹配模式复制行 - copy lines based on matching pattern in bash Bash脚本-输出具有匹配模式和计数的行 - Bash scripting - output lines with matching pattern AND the count 如何在文件的前两行以外的所有行中grep匹配模式 - How to grep a matching pattern in all the lines in a file except the first two lines bash脚本:如何将与模式匹配的这些行替换为另一行? - bash scripting: how do I replace these lines matching a pattern with another line? 如何从我的 bash 历史记录中删除与特定模式匹配的行? - How do I delete lines from my bash history matching a specific pattern? 将匹配字符串附加到特定行(sed / bash) - Appending matching strings to specific lines (sed/bash)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM