[英]How do I concatenate lines starting with a letter?
I am trying to concatenate lines in a text file into two lists.我正在尝试将文本文件中的行连接到两个列表中。 First list would contain lines starting uppercase letter and the second for lines which start with a '_'.第一个列表将包含以大写字母开头的行,第二个列表包含以“_”开头的行。 For instance:例如:
_CAA35997.1 unnamed protein product [Bos taurus] MRTPMLLALLALATLCLAGRADAKPGDAESGKGAAFVSKQEGSEVVKRLRRYLDHWLGAPAPYPDPLEPK REVCELNPDCDELADHIGFQEAYRRFYGPV _CAA35997.1 未命名蛋白产品 [Bos taurus] MRTPMLLALLALATLCLAGRADAKPGDAESGKGAAFVSKQEGSEVVKRLRRYLDHWLGAPAPYPDPLEPK REVCELNPDCDELADHIGFQEAYRRFYGPV
_CAA42669.1 beta-2-glycoprotein I, partial [Bos taurus] PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQIVFSCQPGYVSRGGIRRFTCPLTGLWP INTLKCMPRVCPFAGILENGTVRYTTFEYPNTISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCP _CAA42669.1 beta-2-糖蛋白 I,部分 [Bos taurus] PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQIVFSCQPGYVSRGGIRRFTCPLTGLWP INTLKCMPRVCPFAGILENGTVRYTTFEYPNTISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCP
First list=['MRTPMLLALLALATLCLAGRADAKPGDAESGKGAAFVSKQEGSEVVKRLRRYLDHWLGAPAPYPDPLEPK REVCELNPDCDELADHIGFQEAYRRFYGPV','PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQIVFSCQPGYVSRGGIRRFTCPLTGLWPINTLKCMPRVCPFAGILENGTVRYTTFEYPNTISFSCHTGFYLKGASSAKCTEEGKWSPDLPVCAPITCP']第一个列表=['MRTPMLLALLALATLCLAGRADAKPGDAESGKGAAFVSKQEGSEVVKRLRRYLDHWLGAPAPYPDPLEPK REVCELNPDCDELADHIGFQEAYRRFYGPV','PALVLLLGFLCHVAIAGRTCPKPDELPFSTVVPLKRTYEPGEQIVFSCQPGYVSRGGIRRFTCPLTKTCPKTCPGTCPGWASSTTFEYPNTIKWSPCGLPVCAPI
Second list=['_CAA35997.1','_CAA42669.1']第二个列表=['_CAA35997.1','_CAA42669.1']
I have tried the following which does not seem to work.我尝试了以下似乎不起作用的方法。 I am running into an issue where each new line is stored as a new entry in the first list, and not concatenating the lines into one entry:我遇到了一个问题,每个新行都作为新条目存储在第一个列表中,而不是将这些行连接到一个条目中:
for i in seq.text:
if (i=='_'):
second_list.append(i)
else:
first_list.append(i)
The easiest way is just to do what you're currently doing, and then do str.join()
afterwards to "concatenate" the entire list to each other at once, in order:最简单的方法就是执行您当前正在执行的操作,然后执行str.join()
以一次将整个列表“连接”到彼此,按顺序:
for i in seq.text:
if i.startswith('_'):
second_list.append(i)
# to more closely resemble the output you put in your question,
# you might want to only append the part up to the first whitespace:
# second_list.append(i.split()[0])
else:
first_list.append(i)
first_string = ''.join(first_list)
second_string = ''.join(second_list)
Using an empty string as the separator means that they're concatenated directly to each other, with nothing in between.使用空字符串作为分隔符意味着它们直接相互连接,中间没有任何内容。 You can also use anything else as a separator - a comma ','
, a space ' '
, a newline '\n'
, or any combination depending on what your desired output is.您还可以使用其他任何东西作为分隔符 - 逗号','
、空格' '
、换行符'\n'
或任何组合,具体取决于您想要的 output 是什么。
import re
a_file = open("your_path/test.txt", "r")
list1 = []
list2 = []
for line in a_file:
if not line.strip(): continue # skip the empty line
stripped_line = line.strip()
line_list = ''.join(stripped_line)
# To consider '_' in the first list
#x = re.findall(r"\b_\w+", line_list)
if (line_list.isupper()): # if (x):
list1.append(line_list)`
else:
list2.append(line_list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.