簡體   English   中英

從Python 3的列表中查找和提取模式字符串

[英]find and extract patterned strings from a list in Python 3

我在Python 3中有一個數據類型列表,看起來像這樣。

list1 = ['1128=9,9=639, 75=20140110,268=6,START,22=8,48=49798,83=63663,271=7,1020=7,5799=1,START,48=49798,83=63664,451=0,1003=2,5799=1','1128=9,9=6389, 75=20140119, START, 22=8,48=49798, 271=0.75,1020=7,5799=1,START,22=8,48=49798,83=63664,451=0,1020=10,5799=1,START,22=8,48=49798,271=63664,451=0,1020=10,5799=1']

list1的長度為2。

我想首先提取所有有用的字符串,然后省略所有其他字符串。

我想將所有內容保留為52 =,START,75 =,271 =和451 =。

然后,所需的輸出應為:

list2 = ['75=20140110, START,271=7,START,451=0','75=20140119, START, 271=0.75,START,451=0, START, 271=63664,451=0']

最后一步是我想分割列表並創建一個新列表。

在每個元素中,我想將子字符串“ 75 = .....”粘貼到單詞“ START”之后的子字符串中。

所需的輸出看起來像。

list3 = ['75=20140110, START,271=7', '75=20140110,START,451=0','75=20140119, START, 271=0.75','75=20140119,START,451=0', '75=20140119,START, 271=63664,451=0']

現在,它是5個元素的列表。 我們在元素2的list2中有2個子字符串START,在元素2的list2中有3個子字符串START。

我是Python新手,非常感謝您的幫助。

這應該可以解決您的第一個問題:

(您沒有指定用例是否對空格敏感,因此我忽略了它們)

list1 = [
    '1128=9,9=639, 75=20140110,268=6,START,22=8,48=49798,83=63663,271=7,1020=7,5799=1,START,48=49798,83=63664,451=0,1003=2,5799=1','1128=9,9=6389, 75=20140119, START, 22=8,48=49798, 271=0.75,1020=7,5799=1,START,22=8,48=49798,83=63664,451=0,1020=10,5799=1,START,22=8,48=49798,271=63664,451=0,1020=10,5799=1'
]

texts_to_keep = ['52=', 'START', '75=', '271=', '451=']

# Split the list on commas to work with the data easier
list1_split = [item.split(',') for item in list1]

# Create a new list of the same length as your old list1
list1_new = [[] for item in list1]
for items, list1_list in zip(list1_split, list1_new):
    # Grab each string in the sub list
    for item in items:
        # Now check if your substrings are in the original string
        for text_to_keep in texts_to_keep:
            # If it is, keep it
            if text_to_keep in item:
                list1_list.append(item)

final_list1 = [
    ','.join(sub_list) for sub_list in list1_new
]

給出輸出:

[' 75=20140110,START,271=7,START,451=0', ' 75=20140119, START, 271=0.75,START,451=0,START,271=63664,451=0']

可以通過性能的列表理解來做到這一點,但是它變得非常難看,因此我采用了上面的簡單實現。

根據您的第二個問題,據我所知,您有時會添加子字符串'75 = ...',有時卻不會,並且我無法識別模式。

這應該在列表理解的幫助下解決您的第一個問題

 f = ['1128=9,9=639, 75=20140110,268=6,START,22=8,48=49798,83=63663,271=7,1020=7,5799=1,START,48=49798,83=63664,'
     '451=0,1003=2,5799=1',
     '1128=9,9=6389, 75=20140119, START, 22=8,48=49798, 271=0.75,1020=7,5799=1,START,22=8,48=49798,83=63664,'
     '451=0,1020=10,5799=1,START,22=8,48=49798,271=63664,451=0,1020=10,5799=1']

def convert(li):
    text = ['52=', 'START', '75=', '271=', '451=']
    return [", ".join([y for y in x.split(',') for z in text if z in y]) for x in li]

print(convert(f))
#output [' 75=20140110, START, 271=7, START, 451=0', ' 75=20140119,  START,  271=0.75, START, 451=0, START, 271=63664, 451=0']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM