在Python中从.txt文件读取数据的特定部分

Question

>gene1
ATGATGATGGCG
>gene2
GGCATATC
CGGATACC
>gene3
TAGCTAGCCCGC

This is the text file which I am trying to read. 这是我要阅读的文本文件。 I want to read every gene in a different string and then add it in a list There are header lines starting with '>' character to recognize if this is a start or end of a gene 我想读取不同字符串中的每个基因，然后将其添加到列表中。以“>”字符开头的标题行可以识别这是基因的开始还是结尾

with open('sequences1.txt') as input_data:
    for line in input_data:
            while line != ">":
                list.append(line)
    print(list)

When printed the list should display list should be 打印时列表应显示列表应为

list =["ATGATGATGGCG","GGCATATCCGGATACC","TAGCTAGCCCGC"]

Answer 1

with open('sequences1.txt') as input_data:
    sequences = []
    gene = []
    for line in input_data:
        if line.startswith('>gene'):
            if gene:
                sequences.append(''.join(gene))
                gene = []
        else:
            gene.append(line.strip())
sequences.append(''.join(gene)) # append last gene
print(sequences)

output: 输出：

['ATGATGATGGCG', 'GGCATATCCGGATACC', 'TAGCTAGCCCGC']

Answer 2

sequences1.txt: sequence1.txt：

>gene1
ATGATGATGGCG
>gene2
GGCATATC
CGGATACC
>gene3
TAGCTAGCCCGC

and then: 接着：

desired_text = []
with open('sequences1.txt') as input_data:
    content = input_data.readlines()
    content = [l.strip() for l in content if l.strip()]
    for line in content:
            if not line.startswith('>'):
                desired_text.append(line)

print(desired_text)

OUTPUT: 输出：

['ATGATGATGGCG', 'GGCATATC', 'CGGATACC', 'TAGCTAGCCCGC']

EDIT: 编辑：

Sped-read it, fixed it with the desired output 快速阅读，并使用所需的输出进行修复

with open('sequences1.txt') as input_data:
    content = input_data.readlines()
    # you may also want to remove empty lines
    content = [l.strip() for l in content if l.strip()]
    # flag
    nextLine = False
    # list to save the lines
    textList = []
    concatenated = ''
    for line in content:
        find_TC = line.find('gene')

        if find_TC > 0:
            nextLine = not nextLine
        else:
            if nextLine:
                textList.append(line)
            else:
                if find_TC < 0:
                    if concatenated != '':
                        concatenated = concatenated + line
                        textList.append(concatenated)
                    else:
                        concatenated = line

print(textList)

OUTPUT: 输出：

['ATGATGATGGCG', 'GGCATATCCGGATACC', 'TAGCTAGCCCGC']

Answer 3

You have multiple mistakes in your code, look here: 您的代码中有多个错误，请看这里：

with open('sequences1.txt', 'r') as file:
    list = []
    for line in file.read().split('\n'):
            if not line.startswith(">") and len(line$
                list.append(line)
    print(list)

Answer 4

Try this: 尝试这个：

$ cat genes.txt
>gene1
ATGATGATGGCG
>gene2
GGCATATC
CGGATACC
>gene3
TAGCTAGCCCGC


$ python
>>> genes = []
>>> with open('genes.txt') as file_:
...   for line in f:
...     if not line.startswith('>'):
...       genes.append(line.strip())
...
>>> print(genes)
['ATGATGATGGCG', 'GGCATATC', 'CGGATACC', 'TAGCTAGCCCGC']

在Python中从.txt文件读取数据的特定部分

问题描述

4 个解决方案

解决方案1
2 已采纳 2019-02-01 15:14:05

解决方案2
0 2019-02-01 15:07:37

解决方案3
0 2019-02-01 15:12:07

解决方案4
0 2019-02-01 15:14:57

在Python中从.txt文件读取数据的特定部分

问题描述

4 个解决方案

解决方案1 2 已采纳 2019-02-01 15:14:05

解决方案2 0 2019-02-01 15:07:37

解决方案3 0 2019-02-01 15:12:07

解决方案4 0 2019-02-01 15:14:57

解决方案1
2 已采纳 2019-02-01 15:14:05

解决方案2
0 2019-02-01 15:07:37

解决方案3
0 2019-02-01 15:12:07

解决方案4
0 2019-02-01 15:14:57