简体   繁体   English

使用 another.txt 中的行创建 aa new.txt

[英]Creating a a new .txt with lines from another .txt

I have a document with this structure (it's large, more than 20000 lines)我有一个具有这种结构的文档(它很大,超过 20000 行)

@A00627:308:H227VDSX3:1:1201:30734:26349 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF:FFFFFF:F:FFFFFFFFFFFF
@A00627:308:H227VDSX3:1:1257:18828:34695 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGA
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFF:FFFFFFFF,FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
@A00627:308:H227VDSX3:1:1266:28809:10300 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGAAACCCACTGGGTGCCCG
+
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:,FFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFF:FFF:FFFFFFFFFFFFFFFFFFFFFFFF,FFFFF:,F:FFFFFFF
@A00627:308:H227VDSX3:1:1447:29315:13745 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
+

And I want to keep these lines starting with 2 @ and the next one.我想保留以 2 @ 和下一行开头的这些行。 Like this:像这样:

    @A00627:308:H227VDSX3:1:1201:30734:26349 2:N:0:TGGCAGTA+GTACAGTG
    CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT
   
    
    @A00627:308:H227VDSX3:1:1257:18828:34695 2:N:0:TGGCAGTA+GTACAGTG
    CTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGA
    
    
    @A00627:308:H227VDSX3:1:1266:28809:10300 2:N:0:TGGCAGTA+GTACAGTG
    CTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGAAACCCACTGGGTGCCCG
    
    
    @A00627:308:H227VDSX3:1:1447:29315:13745 2:N:0:TGGCAGTA+GTACAGTG
    CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT

I have tried this code:我试过这段代码:

import fileinput
from collections import deque
output_file = 'cola1_fasta.txt' 
buscado = '@'

contexto = deque([], 3)  # for keeping the last 4 lines


with open(output_file, "w") as f_out:
    for line in fileinput.input(files=["cola1.txt"]):
        contexto.append(line)       
        if len(contexto) < 3:      
            continue
        if buscado in contexto[1]:  
            f_out.writelines(contexto) 

But I can obtain this.但是我可以得到这个。 Do you have any suggestion?你有什么建议吗? Many thanks!!非常感谢!!

Loop over the input file line by line, check if the line starts with @ , if so, write that line to file, and set the header_row flag to True so on the next iteration we know to write next line to file.逐行循环输入文件,检查该行是否以@开头,如果是,将该行写入文件,并将header_row标志设置为 True 以便在下一次迭代中我们知道将下一行写入文件。

input_filename = r"cola1.txt"
output_filename = r"cola1_fasta.txt"

header_row = False
with open(input_filename) as in_f:
    with open(output_filename, "wt") as out_f:
        for line in in_f:
            if line.startswith("@"):
                out_f.write(line)
                header_row = True
            elif header_row:
                out_f.write(line)
                header_row = False
            else:
                out_f.write("\n")

cola1_fasta.txt : cola1_fasta.txt

@A00627:308:H227VDSX3:1:1201:30734:26349 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT


@A00627:308:H227VDSX3:1:1257:18828:34695 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGA


@A00627:308:H227VDSX3:1:1266:28809:10300 2:N:0:TGGCAGTA+GTACAGTG
CTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATTAAGAGAAGAGAAGAAACGCCCACGCCAGGAAACCCACTGGGTGCCCG


@A00627:308:H227VDSX3:1:1447:29315:13745 2:N:0:TGGCAGTA+GTACAGTG
CCCAGGAGCACCAGGAAGGGCAAGAGCACCCTGGCCTAGGGGATCATCTGGCCCAGGGTAGGGTAGGAACAGCCTCATGGTCTTCAGAGTTTGCCCCTTCCTGAGGGAAAGACATTTTAATATTTTTGGGTTGGCTGGACCAATCTCATT

Note this implementation results in 2 extra blank lines at the bottom of the text file.请注意,此实现会在文本文件底部产生 2 个额外的空行。

Take advantage of the fact that files are iterators in Python. So loop the file lin-by-line, check if the line starts with @ then write that line and the following one (using next ) to the output file:利用文件是 Python 中的迭代器这一事实。因此逐行循环文件,检查该行是否以@开头,然后将该行和下一行(使用next )写入 output 文件:

with open(output_file, 'w') as out_file, open(input_file) as in_file):
    for line in in_file:
        if line.startswith('@'):
            out_file.write(line)
            out_file.write(next(in_file)
        else:
            out_file.write('\n')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM