简体   繁体   中英

Grep match only specific lines, but keep context

I have a file where I'm looking for a pattern "N" on only the even-numbered lines. When a line matches, I want to keep the context -- the odd-numbered line above it.

I understand how to keep the context using -A, -B, -C but the pattern "N" will also possibly match the odd-numbered lines, so the only way I can think of solving the problem is by separating the even and odd lines before using grep, thus removing the context.

Is there a way to do this without having to extract the line numbers that have are matched with grep, and then getting those specific lines from the file after-the-fact? I suspect I might be able to do it with awk, but I'm not sure.

I'm trying to optimize code that I believe already works, because the files it will work on will be humongous and take hours to run.


I'm trying to find any of the DNA sequences that have "N"s in them, and put them in one file, and any sequences that don't have "N"s in them, and put them in another file. The ID lines can also have "N"s however. I want the ID lines to stay connected to each sequence in a line above it in the new files.

Sample Input:

>100000|NODE_2_length_277_cov_4.245487
ATCTTTTAACCCCAAAAACTCAAGTATGTGAGCCAAGTGAACATAACTGCATAAATATCAGGCTCCAAAATAATCTACTGCTTGTTGTGTAGATATAGAGCACACAATTTCTTTTTTAAAGCCCTCCCTTTCACTCTCTCTATCCCACACCCAGAAAAACTCCTATTTAGAGAAAGCCACACCTATCACTAAGAGCAAACCAACCTTTCAAAAAAAAAAAAAAAACACATTAGGAGCAAACTGTTAGGAGCCATTCAAAACCAAAGGAAATGCCAAGACACACACACACACACACACACACAC
>100001|NODE_1_length_426_cov_11.427230
AAATATATAAAAAACCTGTGTTGTGACAACAGGTTGAGAAGTAATGAGAAAATGGACGAATTAGTTCAGGATGTCTCAAAGCAGATTTCTTTCCACTTAATCTCGATGTCCTACGAAAATGCTGACTTAGGTTGTAGTTTATGTTTCTTAGATTCCAATATTTTAAAATGGCCCTTGAAATTATATTAAAAAGCTCATGAACAAGTGCATAATCAATGATAAATGAATATTTATGGTTGAGATTTGGGAATTATTAATCAATATACCTCTATACTCTTGGCTCTCTTGAAGTTTAATTCAAGTGTATTTAATTAGATTCCTACCCCAAATCAACTTTAAGAAGGCTGCTTTTCTTCTNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNCG

With awk:

seq 10 | 
awk -v pattern='[26]' '
  FNR % 2 == 1 {odd = $0}
  FNR % 2 == 0 && $0 ~ pattern {print odd; print}
'
1
2
5
6

With your sample input:

awk  '
  FNR % 2 == 1 {odd = $0}
  FNR % 2 == 0 {
    if (/N/) 
      file = FILENAME ".with_N"
    else 
      file = FILENAME ".no_N"
    print odd > file
    print     > file
  }
' myfile

Another solution with fewer keystrokes will be

awk '!(NR%2) && /N/ {print p; print}{p=$0}'

!(NR%2) idiom is for picking even numbered lines; also keeps the previous line without any condition since will be printed only matched lines.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM