![](/img/trans.png)
[英]I need to split a python list into several python lists but the new lists need to contain fields between certain strings
[英]Python split a list into several lists for each new line character
我有以下列表,當列表中的元素為“\n”時,我想將其拆分為多個列表。
輸入:
['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']
預計 output:
[
['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']
]
我嘗試在它們的末尾剝離帶有“\n”的元素,並使用並修改了這篇文章中接受的答案:
for i, n in enumerate(lst):
if n != "\n":
lst[i] = lst[i].rstrip("\n")
[item.split(",") for item in ','.join(lst).split('\n') if item]
但是由於我使用逗號而不是單個空格來連接和拆分,所以在拆分成幾個列表后我得到了“”。 我怎樣才能防止這種情況發生?
[
['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290','27',''],
['','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064','27','']
]
這對你有用嗎?
list1 = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']
list2 = []
tmp = []
for item in list1:
if item != '\n':
tmp.append(item.rstrip('\n'))
else:
#Note we aren't actually processing this item of the input list, as '\n' by itself is unwanted
list2.append(tmp)
tmp = []
我建議使用more_itertools.split_at
拆分您的列表。
因為您的原始列表以分隔符'\n'
結尾,所以拆分它會導致列表的最后一項成為空子列表。 if
檢查排除了這一點。
from more_itertools import split_at
original = [
'chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n',
'27\n',
'\n',
'chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n',
'27\n',
'\n'
]
processed = [
[item.rstrip() for item in sublist]
for sublist in split_at(original, lambda i: i == '\n')
if sublist
]
print(processed)
Output(為清楚起見添加了換行符):
[['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']]
使用列表理解從列表中刪除\n
項,並使用.strip()
刪除要保留的行末尾的\n
。 然后遍歷臨時列表 ( blist
) 並創建最終列表。
alist = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']
# use line comprehension to remove the \n from the list
# and use the .strip() to remove the trailing \n in the strings
blist = [i.strip() for i in alist if i != '\n']
final_list = []
for i in range(0,len(blist),2):
final_list.append( [blist[i], blist[i+1]] )
你可以使用groupby
:
from itertools import groupby
seq = ['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290\n', '27\n','\n','chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064\n','27\n','\n']
result = [
[string.rstrip() for string in group]
for key, group in groupby(seq, lambda s: s != "\n")
if key
]
結果:
[
['chain 2109 chrY 59373566 + 1266734 1266761 chrX 156040895 + 1198245 1198272 20769290', '27'],
['chain 2032 chrY 59373566 + 1136192 1136219 chrX 156040895 + 1086629 1086656 4047064', '27']
]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.