簡體   English   中英

pythonic方法優化邏輯以從列表中過濾/提取數據

[英]pythonic way to optimize the logic to filter/extract data from list

我有一個如下列表:

['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))',
 '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))',
 '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
 '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
 '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', 
'22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

從這個列表中我想要三個不同的列表作為結果。 我想在列表上使用單次迭代得到結果。

  1. 所有uid列表,即[3234,3235,3236,3237,3241 .....]
  2. Seen uids列表,即[3234,3235 ...] < - 具有\\ Seen Flag的項目的uid
  3. 已刪除的uid列表,即[3236,3253] < - 具有\\已刪除標志的項目的uid

最好的辦法是將數據轉換為將UID映射到FLAGS的dict ,然后搜索它將很容易。 所以數據看起來像這樣:

{'3254': '', '3304': '', '3236': '\\Deleted', '3237': '-FLAGS \\Seen +FLAGS', '3234': 'seen \\Seen', '3235': '\\Seen', '3430': '\\Seen', '3431': '', '3252': '\\Seen', '3253':'\\Deleted', '3478': '', '3479': '', '3256': '\\Seen', '3481': '', '3480': '', '3318': '\\Seen', '3434': '\\Seen', '3243': '\\Seen', '3242': '\\Seen', '3241': '-FLAGS \\Seen +FLAGS', '3247': '\\Seen', '3245': '\\Seen', '3244': '\\Seen', '3447': '-FLAGS \\Seen +FLAGS'}

您可以使用正則表達式來匹配列表中的每個條目。 如果我們讓regexp在匹配中返回兩個組,我們可以輕松地構建dict

所以我們最終得到這樣的東西:

items = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))', '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))', '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))',  '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))',  '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', '11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', '13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', '16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', '18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', '20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', '22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

import re
pattern = re.compile(r"\d+ \(UID (\d+) FLAGS \(([^)]*)\)\)")
values = dict(pattern.match(item).groups() for item in items)

然后,我們可以輕松查詢values的項目以獲得您想要的內容:

print "All UIDs:",values.keys()
print "Seen UIDs:",[uid for uid,flags in values.iteritems() if r"\Seen" in flags]
print "Deleted UIDs:",[uid for uid,flags in values.iteritems() if r"\Deleted" in flags]
import re

data = ['1 (UID 3234 FLAGS (seen \\Seen))', '2 (UID 3235 FLAGS (\\Seen))',
 '3 (UID 3236 FLAGS (\\Deleted))', '4 (UID 3237 FLAGS (-FLAGS \\Seen +FLAGS))',
 '5 (UID 3241 FLAGS (-FLAGS \\Seen +FLAGS))', '6 (UID 3242 FLAGS (\\Seen))', 
 '7 (UID 3243 FLAGS (\\Seen))', '8 (UID 3244 FLAGS (\\Seen))', 
 '9 (UID 3245 FLAGS (\\Seen))', '10 (UID 3247 FLAGS (\\Seen))', 
'11 (UID 3252 FLAGS (\\Seen))', '12 (UID 3253 FLAGS (\\Deleted))', 
'13 (UID 3254 FLAGS ())', '14 (UID 3256 FLAGS (\\Seen))', '15 (UID 3304 FLAGS ())', 
'16 (UID 3318 FLAGS (\\Seen))', '17 (UID 3430 FLAGS (\\Seen))', 
'18 (UID 3431 FLAGS ())', '19 (UID 3434 FLAGS (\\Seen))', 
'20 (UID 3447 FLAGS (-FLAGS \\Seen +FLAGS))', '21 (UID 3478 FLAGS ())', 
'22 (UID 3479 FLAGS ())', '23 (UID 3480 FLAGS ())', '24 (UID 3481 FLAGS ())']

r = re.compile('\d+\s\(UID\s(?P<uid>\d+)\sFLAGS\s\((?P<data>.*)\)\)')
uid_list = []
seen_uid_list = []
deleted_uid_list = []
for s in data:
    m = r.match(s)
    if m:
        uid_list.append(m.group('uid'))
        if m.group('data').rfind('Seen') > 0: seen_uid_list.append(m.group('uid'))
        if m.group('data').rfind('Deleted') > 0: deleted_uid_list.append(m.group('uid'))

print uid_list
print seen_uid_list
print deleted_uid_list

我不確定列表推導,因為那些通常將一個列表映射到另一個列表(使用過濾或映射)。 我沒有看到它們被用來拆分列表。 但是,您可以在單次迭代中結合使用genexp和循環。 我已經把它吹了一點,所以它很清楚。

import re
grepper = re.compile(r'[0-9]+ \(UID (?P<uid>[0-9]+) FLAGS (?P<flags>\(.*\))\)')

t = [..] #your list

items = (grepper.search(m).groupdict() for m in t)

all = []
seen = []
deleted = []
for i in items:
  if "Seen" in i:
    seen.append(i["uid"])
  if "Deleted" in i:
    deleted.append(i["uid"])
  all.append(i["uid"])

你現在應該有3個清單。

all,deleted,seen = [list(filter(None, a)) for a in \
    zip(*map(lambda a: (a[2], '\Deleted' in a[-1] and a[2], '\Seen' in  a[-1] and a[2]), map(lambda a: a.split(' '), items)))]

使用re或沒有re會更快 - 你需要檢查timeit !!!

這個適用於您的數據樣本....

uids, seen, deleted = [], [], []
for item in myList:
    uids.append(int(item[7:12]))
    if 'Se' in item[20:]:  seen.append(uids[-1])
    elif 'De' in item[20:]: deleted.append(uids[-1])
all=[]
seen=[]
deleted=[]
for item in alist:
    s=item.split(" ",4)
    all.append(s[2])
    if "seen" in s[-1].lower():
        seen.append(s[2])
    elif "delete" in s[-1].lower():
        deleted.append(s[2])

我可以想到在生成您要求的三個列表的一次迭代中執行此操作的唯一方法是手動迭代。 沒有python魔法,我可以想出來。

如果您了解有關格式及其生成方式的詳細信息,則可以輕松改進此項。 我不知道為什么+ FLAGS和-FLAGS在某些項目中,例如,並且不知道何時需要括號,所以我不得不使用find()。 另外,我可以將字符串split()分成兩部分,但話又說回來,我不知道標志格式是什么意思,...

def parseList(l):
    lall = []
    lseen = []
    ldeleted = []

    for item in l:
        spl = item.split()

        uid = int(spl[2])

        lall.append(uid)

        for word in spl[4:]:
            if word.find("\Seen") != -1:
                lseen.append(uid)

            elif word.find("\Deleted") != -1:
                ldeleted.append(uid)

    return lall, lseen, ldeleted

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM