繁体   English   中英

Python 为每个换行符将一个列表拆分成几个列表

[英]Python split a list into several lists for each new line character



['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']

预计 output:

 ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
 ['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']


for i, n in enumerate(lst):
    if n != "\n":
        lst[i] = lst[i].rstrip("\n")

[item.split(",") for item in ','.join(lst).split('\n') if item]

但是由于我使用逗号而不是单个空格来连接和拆分,所以在拆分成几个列表后我得到了“”。 我怎样才能防止这种情况发生?

 ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290','27',''],
 ['','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064','27','']


list1 = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']

list2 = []
tmp = []
for item in list1:
    if item != '\n':
        #Note we aren't actually processing this item of the input list, as '\n' by itself is unwanted
        tmp = []


因为您的原始列表以分隔符'\n'结尾,所以拆分它会导致列表的最后一项成为空子列表。 if检查排除了这一点。

from more_itertools import split_at

original = [
    'chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n',
    'chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n',

processed = [
    [item.rstrip() for item in sublist]
    for sublist in split_at(original, lambda i: i == '\n')
    if sublist



[['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
 ['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']]

使用列表理解从列表中删除\n项,并使用.strip()删除要保留的行末尾的\n 然后遍历临时列表 ( blist ) 并创建最终列表。

alist = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']

# use line comprehension to remove the \n from the list
# and use the .strip() to remove the trailing \n in the strings
blist = [i.strip() for i in alist if i != '\n']

final_list = []
for i in range(0,len(blist),2):
    final_list.append( [blist[i], blist[i+1]] )


from itertools import groupby

seq = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']

result = [
    [string.rstrip() for string in group]
    for key, group in groupby(seq, lambda s: s != "\n")
    if key


    ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
    ['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']


声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM