How can I print two lines after the context i am interest in using python.
Example.fastq
@read1
AAAGGCTGTACTTCGTTCCAGTTG
+
'(''%$'))%**)2+'.(&&'/5-
@read2
CTGAGTTGAGTTAGTGTTGACTC
+
)(+-0-2145=588..,(1-,12
I can find the context of interest using...
fastq = open(Example.fastq, "r")
IDs = [read1]
with fastq as fq:
for line in fq:
if any(string in line for string in IDs):
Now that I have found read1 I want to print out the the following lines for read1. In bash i might use something like grep -A to do this. The desired output lines look like the following.
+
'(''%$'))%**)2+'.(&&'/5-
But in python i cant seem to find an equivalent tool. Perhaps "islice" might work but I don't see how I can get islice to start at the position of the match.
with fastq as fq:
for line in fq:
if any(string in line for string in IDs):
print(list(islice(fq,3,4)))
You can use next()
to advance an iterator (including files):
print(next(fq))
print(next(fq))
This consumes those lines, so the for
loop will continue with @read2
.
if you don't want the AAA...
line, you can also just consume it with next(fq)
. In full:
fastq = open(Example.fastq, "r")
IDs = [read1]
with fastq as fq:
for line in fq:
if any(string in line for string in IDs):
next(fq) # skip AAA line
print(next(fq).strip()) # strip off the extra newlines
print(next(fq).strip())
which gives
+
'(''%$'))%**)2+'.(&&'/5-
If you're handling FASTQ files, you're better off using a bioinformatics library like BioPython instead of rolling your own parser.
To get the exact result you requested, you can do:
from Bio.SeqIO.QualityIO import FastqGeneralIterator
IDs = ['read1']
with open('Example.fastq') as in_handle:
for title, seq, qual in FastqGeneralIterator(in_handle):
# The ID is the first word in the title line (after the @ sign):
if title.split(None, 1)[0] in IDs:
# Line 3 is always a '+', optionally followed by the same sequence identifier again.
print('+')
print(qual)
But you can't do much with the line of quality values on its own. Your next step will be almost certainly be to convert it to Phred quality scores . But this is notoriously complicated because there are at least three different and incompatible variants of the FASTQ file format . BioPython takes care of all the edge cases for you, so that you can just do:
from Bio.SeqIO import parse
IDs = ['read1']
with open('Example.fastq') as in_handle:
for record in parse(in_handle, 'fastq'):
if record.id in IDs:
print(record.letter_annotations["phred_quality"])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.