[英]Subset a list in python on pre-defined string
我有一些非常大的字符串列表需要解析。 我需要根據預定義的字符串將它們分成更小的列表,並且我想出了一種方法,但我擔心這不會對我的真實數據產生影響。 有一個更好的方法嗎?
我的目標是打開這個列表:
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
進入此列表:
[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]
我嘗試了什么:
# List that replicates my data. `string_to_split_on` is a fixed character string I want to break my list up on
my_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
# Inspect List
print(my_list)
# Create empty lists to store dat ain
new_list = []
good_letters = []
# Iterate over each string in the list
for i in my_list:
# If the string is the seporator, append data to new_list, reset `good_letters` and move to the next string
if i == 'string_to_split_on':
new_list.append(good_letters)
good_letters = []
continue
# Append letter to the list of good letters
else:
good_letters.append(i)
# I just like printing things thay because its easy to read
for item in new_list:
print(item)
print('-'*100)
### Output
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
['a', 'b']
----------------------------------------------------------------------------------------------------
['c', 'd', 'e', 'f', 'g']
----------------------------------------------------------------------------------------------------
['h', 'i', 'j', 'k']
----------------------------------------------------------------------------------------------------
您也可以使用一行代碼:
original_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
split_string = 'string_to_split_on'
new_list = [sublist.split() for sublist in ' '.join(original_list).split(split_string) if sublist]
print(new_list)
這種方法在處理大數據集時更有效:
import itertools
new_list = [list(j) for k, j in itertools.groupby(original_list, lambda x: x != split_string) if k]
print(new_list)
[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.