管理大要點列表，最好的方法？

Question

我在代碼中遇到了性能問題，決定重寫它，並且需要一些有關如何解決此問題的建議。 我有大量的光流數據列表，其中包括帶有框架，X和Y坐標的列表。 像這樣：

[[[frame,x,y],[frame,x,y]],[[frame,x,y],[frame,x,y]]...]

我在這里上傳了一個示例： http : //pastebin.com/ANUr8bwc

我需要找到一種方法來管理這些數據，以便可以快速查找並查看哪些列表包含某些框架。

到目前為止，我已經遍歷了所有數據以查看哪些列表包含第34幀和第35幀，然后將它們編入新列表以供參考。

thisFrame = 34
nextFrame = thisFrame + 1
if any(e[0] == thisFrame for e in item) and any(e[0] == nextFrame for e in item): #Check if the item contains this frame and next
    DoStuff()

為了獲得10.000+點的分數而進行數千次操作很快就會成為瓶頸。 所以我的想法是為每個框架制定一個命令，這樣就可以輕松地找到特定框架上可用的項目：

{frame, [list1,list2,list3]}

但是我這次最好問一下。 有沒有一種很好的goto方法來存儲並能夠在大型數據集中進行查找，從而避免每次需要時都遍歷所有這些數據？

Answer 1

詞典是Python最優化的數據結構之一。 將您的數據轉換成字典將需要一些時間，但是之后，查找（例如thisFrame in item ）將在O(1)時間內完成，這比列表要快得多。

最好的選擇是使用以下代碼將其轉換為字典：

注意：您的列表似乎嵌套了兩次，如果不是這種情況，則必須稍微修改迭代。

item_dict = {}
for big_lst in item:
    for lst in big_lst:
        try:
            item_dict[lst[0]] += [lst[1:],] # Append to existing value
        except KeyError:
            item_dict[lst[0]] = [lst[1:],] # Initialize value

編輯02/05：try / except更快

item_dict看起來像這樣，合並了重復的幀，因此單個幀查找將返回[x，y]對的列表。

item_dict = {
    1: [list1, list2, list3]
    2: [list1, list2]
    3: [list1]
    4: [list1, list2, list3]
}

從那時起查找將非常快：

thisFrame = 34
nextFrame = thisFrame + 1
if thisFrame in item_dict and nextFrame in item_dict:
    foo = item_dict[thisFrame] # e.g. [list1, list2]
    bar = item_dict[nextFrame] # e.g. [list1, list2, list3]
    DoStuff()

如果需要跟蹤各個[x，y]對所屬的父級列表，則可以在每個列表中添加一個附加元素，該元素將父級列表的索引存儲在item ：

item_dict = {}
for list_index, big_lst in enumerate(item):
    for lst in big_lst:
        if lst[0] in item_dict:
            item_dict[lst[0]] += [lst[1:]+[list_index],] # Append
        else:
            item_dict[lst[0]] = [lst[1:]+[list_index],] # Initialize

然后，像

parent_list = item_dict[thisFrame][2] # [x, y, list_index]

將返回可以訪問的父列表：

item[parent_list]

Answer 2

我在這里想做的是：

首先，我嘗試通過創建一個稱為frame的字典將所有x，y合並到唯一幀。 其次，我通過將值轉換為鍵並將鍵轉換為值來還原字典。

請讓我知道是否可行：否則，我將對其進行修改或刪除。

#!/usr/bin/python

import ast

dict_frame = dict()
def generate_datastructure():
    l = ''
    with open('2d_optical_tracking_data.txt', 'r') as fh:
        l = ast.literal_eval(fh.read())
    frame = dict()
    for ls in l:
        for elm in ls:
            key = elm[0]
            val_list = elm[1:]
            frame.setdefault(key, [])
            frame[key].extend(val_list)

    # convert all the value into set:
    for key, val in frame.iteritems():
        dict_frame[tuple(set(val))] = key

def lookup(key):
    print dict_frame

if __name__ == '__main__':
    tofind = '45.835999'
    generate_datastructure()
    for key, val in dict_frame.iteritems():
        if tofind in key:
            print val

管理大要點列表，最好的方法？

問題描述

2 個解決方案

解決方案1
1 已采納 2014-02-03 07:32:08

解決方案2
1 2014-02-03 07:45:03

管理大要點列表，最好的方法？

問題描述

2 個解決方案

解決方案1 1 已采納 2014-02-03 07:32:08

解決方案2 1 2014-02-03 07:45:03

解決方案1
1 已采納 2014-02-03 07:32:08

解決方案2
1 2014-02-03 07:45:03