简体   繁体   English

Python 为每个换行符将一个列表拆分成几个列表

[英]Python split a list into several lists for each new line character

I have the following list, and I would like to split it into several lists when the element in the list is "\n".我有以下列表,当列表中的元素为“\n”时,我想将其拆分为多个列表。

Input:输入:

['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']

expected output:预计 output:

[
 ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
 ['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']
]

I tried stripping the elements with "\n" at the end of them and used and modified the accepted answer from this post :我尝试在它们的末尾剥离带有“\n”的元素,并使用并修改了这篇文章中接受的答案:

for i, n in enumerate(lst):
    if n != "\n":
        lst[i] = lst[i].rstrip("\n")

[item.split(",") for item in ','.join(lst).split('\n') if item]

But since I am using a comma instead of a single white space to join and split, I get "" after splitting into several lists.但是由于我使用逗号而不是单个空格来连接和拆分,所以在拆分成几个列表后我得到了“”。 How can I prevent this?我怎样才能防止这种情况发生?

[
 ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290','27',''],
 ['','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064','27','']
]

This work for you?这对你有用吗?

list1 = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']

list2 = []
tmp = []
for item in list1:
    if item != '\n':
        tmp.append(item.rstrip('\n'))
    else:
        #Note we aren't actually processing this item of the input list, as '\n' by itself is unwanted
        list2.append(tmp)
        tmp = []

I would recommend splitting your list with more_itertools.split_at .我建议使用more_itertools.split_at拆分您的列表。

Because your original list ends with the separator, '\n' , splitting it will result in the final item of your list being an empty sublist.因为您的原始列表以分隔符'\n'结尾,所以拆分它会导致列表的最后一项成为空子列表。 The if check excludes this. if检查排除了这一点。

from more_itertools import split_at

original = [
    'chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n',
    '27\n',
    '\n',
    'chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n',
    '27\n',
    '\n'
]

processed = [
    [item.rstrip() for item in sublist]
    for sublist in split_at(original, lambda i: i == '\n')
    if sublist
]

print(processed)

Output (line break added for clarity): Output(为清楚起见添加了换行符):

[['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
 ['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']]

Use a list comprehension to strip the \n item from the list and use .strip() to remove the \n at the end of the lines you want to keep.使用列表理解从列表中删除\n项,并使用.strip()删除要保留的行末尾的\n Then loop through the temp list ( blist ) and create your final list.然后遍历临时列表 ( blist ) 并创建最终列表。

alist = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']

# use line comprehension to remove the \n from the list
# and use the .strip() to remove the trailing \n in the strings
blist = [i.strip() for i in alist if i != '\n']

final_list = []
for i in range(0,len(blist),2):
    final_list.append( [blist[i], blist[i+1]] )

You could use groupby :你可以使用groupby

from itertools import groupby

seq = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']

result = [
    [string.rstrip() for string in group]
    for key, group in groupby(seq, lambda s: s != "\n")
    if key
]

Result:结果:

[
    ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
    ['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']
]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM