[英]Creating a a new .txt with lines from another .txt
I have a document with this structure (it's large, more than 20000 lines)我有一个具有这种结构的文档(它很大,超过 20000 行)
@A00627:308:H227VDSX3:1:1201:30734:26349 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFF:F:FFFFFFFFFFFF
@A00627:308:H227VDSX3:1:1257:18828:34695 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFF:FFFFFFFF,FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00627:308:H227VDSX3:1:1266:28809:10300 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGAAACCCACTGGGTGCCCG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFF,FFFFF:,F:FFFFFFF
@A00627:308:H227VDSX3:1:1447:29315:13745 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
+
And I want to keep these lines starting with 2 @ and the next one.我想保留以 2 @ 和下一行开头的这些行。 Like this:
像这样:
@A00627:308:H227VDSX3:1:1201:30734:26349 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
@A00627:308:H227VDSX3:1:1257:18828:34695 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGA
@A00627:308:H227VDSX3:1:1266:28809:10300 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGAAACCCACTGGGTGCCCG
@A00627:308:H227VDSX3:1:1447:29315:13745 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
I have tried this code:我试过这段代码:
import fileinput
from collections import deque
output_file = 'cola1_fasta.txt'
buscado = '@'
contexto = deque([], 3) # for keeping the last 4 lines
with open(output_file, "w") as f_out:
for line in fileinput.input(files=["cola1.txt"]):
contexto.append(line)
if len(contexto) < 3:
continue
if buscado in contexto[1]:
f_out.writelines(contexto)
But I can obtain this.但是我可以得到这个。 Do you have any suggestion?
你有什么建议吗? Many thanks!!
非常感谢!!
Loop over the input file line by line, check if the line starts with @
, if so, write that line to file, and set the header_row
flag to True so on the next iteration we know to write next line to file.逐行循环输入文件,检查该行是否以
@
开头,如果是,将该行写入文件,并将header_row
标志设置为 True 以便在下一次迭代中我们知道将下一行写入文件。
input_filename = r"cola1.txt"
output_filename = r"cola1_fasta.txt"
header_row = False
with open(input_filename) as in_f:
with open(output_filename, "wt") as out_f:
for line in in_f:
if line.startswith("@"):
out_f.write(line)
header_row = True
elif header_row:
out_f.write(line)
header_row = False
else:
out_f.write("\n")
cola1_fasta.txt
: cola1_fasta.txt
:
@A00627:308:H227VDSX3:1:1201:30734:26349 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
@A00627:308:H227VDSX3:1:1257:18828:34695 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGA
@A00627:308:H227VDSX3:1:1266:28809:10300 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGAAACCCACTGGGTGCCCG
@A00627:308:H227VDSX3:1:1447:29315:13745 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
Note this implementation results in 2 extra blank lines at the bottom of the text file.请注意,此实现会在文本文件底部产生 2 个额外的空行。
Take advantage of the fact that files are iterators in Python. So loop the file lin-by-line, check if the line starts with @
then write that line and the following one (using next
) to the output file:利用文件是 Python 中的迭代器这一事实。因此逐行循环文件,检查该行是否以
@
开头,然后将该行和下一行(使用next
)写入 output 文件:
with open(output_file, 'w') as out_file, open(input_file) as in_file):
for line in in_file:
if line.startswith('@'):
out_file.write(line)
out_file.write(next(in_file)
else:
out_file.write('\n')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.