簡體   English   中英

在預定義字符串上子集 python 中的列表

[英]Subset a list in python on pre-defined string

我有一些非常大的字符串列表需要解析。 我需要根據預定義的字符串將它們分成更小的列表,並且我想出了一種方法,但我擔心這不會對我的真實數據產生影響。 有一個更好的方法嗎?

我的目標是打開這個列表:

['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']

進入此列表:

[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]

我嘗試了什么:

# List that replicates my data.  `string_to_split_on` is a fixed character string I want to break my list up on 
my_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']

# Inspect List
print(my_list)

# Create empty lists to store dat ain 
new_list = []
good_letters = []

# Iterate over each string in the list
for i in my_list:

    # If the string is the seporator, append data to new_list, reset `good_letters` and move to the next string
    if i == 'string_to_split_on':
        new_list.append(good_letters)
        good_letters = []
        continue

    # Append letter to the list of good letters
    else:
        good_letters.append(i)



# I just like printing things thay because its easy to read
for item in new_list:
    print(item)
    print('-'*100)


### Output
['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
['a', 'b']
----------------------------------------------------------------------------------------------------
['c', 'd', 'e', 'f', 'g']
----------------------------------------------------------------------------------------------------
['h', 'i', 'j', 'k']
----------------------------------------------------------------------------------------------------

您也可以使用一行代碼:

original_list = ['a', 'b', 'string_to_split_on', 'c', 'd', 'e', 'f', 'g', 'string_to_split_on', 'h', 'i', 'j', 'k', 'string_to_split_on']
split_string = 'string_to_split_on'

new_list = [sublist.split() for sublist in ' '.join(original_list).split(split_string) if sublist]
print(new_list)

這種方法在處理大數據集時更有效:

import itertools

new_list = [list(j) for k, j in itertools.groupby(original_list, lambda x: x != split_string) if k]
print(new_list)

[['a', 'b'], ['c', 'd', 'e', 'f', 'g'], ['h', 'i', 'j', 'k']]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM