简体   繁体   English

如何打印每个ID的最后一行?

[英]How to print the last line of each ID?

I have a list of IDs (exon) each with multiple suffixes and I want to get the last line of each ID 我有一个ID(外显子)列表,每个ID具有多个后缀,我想获取每个ID的最后一行

Input: 输入:

NM_203_exon_19
NM_203_exon_20
NM_0217_exon_7
NM_0217_exon_8
NM_0217_exon_9
NM_91_exon_14
NM_91_exon_15
NM_91_exon_16
NM_91_exon_17

Desired output: 所需的输出:

NM_203_exon_20
NM_0217_exon_9
NM_91_exon_17
tac INPUTFILE |awk -F'_' '!a[$1FS$2]++' |tac
NM_203_exon_20
NM_0217_exon_9
NM_91_exon_17

As you scan each line, you could check store the previous "ID" and then print it if the new "ID" is different: 扫描每一行时,可以检查是否存储了先前的“ ID”,如果新的“ ID”不同,则可以打印出来:

$ awk -F'_exon_' '{if($1 != id && last)print last; id=$1; last=$0} END{print last}' file
NM_203_exon_20
NM_0217_exon_9
NM_91_exon_17

As your list is ordered , One idea could be put everything in array and iterate trough it (without using awk), when you change String means the previous was the biggest one and you will print it, it will not get the last one , so you will print the last item of the array in the end . 当您的列表被排序时,一个想法可以将所有内容放入数组并迭代通过(不使用awk),当您更改String时,意味着前一个是最大的,您将打印它,而不会得到最后一个,您将在最后打印数组的最后一项。

!/usr/bin/bash
fileString=$(cat  filename |tr "\n" " ")
array=($fileString)
for ((i=0; i < ${#array[@]}-1; i++))
do
  if [${array[$i]} != ${array[$i+1]}]; THEN 
    echo ${array[$i]}
  fi
done
   echo ${array[${#array[@]}-1]}

As the file is already sorted as per suffix for each id, print the last line for each id 由于文件已经按照每个ID的后缀排序,因此请为每个ID打印最后一行

awk -F"_" 'NR==1{prev=$2}; $2==prev{a=$0} $2!=prev{print a; prev=$2} END{print $0}' file

Output: 输出:

NM_203_exon_20
NM_0217_exon_9
NM_91_exon_17

With GNU sort for -s (stable sort): 使用GNU -s排序(稳定排序):

$ tac file | sort -t_ -k2,2 -su
NM_0217_exon_9
NM_203_exon_20
NM_91_exon_17

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM