將列表字典轉換為鍵和值對列表的有效方法

Question

我有如下列表的字典（它可以超過1M個元素，還假定字典按鍵排序）

import scipy.sparse as sp
d = {0: [0,1], 1: [1,2,3], 
     2: [3,4,5], 3: [4,5,6], 
     4: [5,6,7], 5: [7], 
     6: [7,8,9]}

我想知道什么是將其轉換為行和列索引列表的最有效方法（大型字典的最快方法），例如：

r_index = [0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 6, 6, 6]
c_index = [0, 1, 1, 2, 3, 3, 4, 5, 4, 5, 6, 5, 6, 7, 7, 7, 8, 9]

到目前為止，這里有一些解決方案：

使用迭代

 row_ind = [k for k, v in d.iteritems() for _ in range(len(v))] # or d.items() in Python 3 col_ind = [i for ids in d.values() for i in ids]

使用熊貓庫

 import pandas as pd df = pd.DataFrame.from_dict(d, orient='index') df = df.stack().reset_index() row_ind = list(df['level_0']) col_ind = list(df[0])

使用itertools

 import itertools indices = [(x,y) for x, y in itertools.chain.from_iterable([itertools.product((k,), v) for k, v in d.items()])] indices = np.array(indices) row_ind = indices[:, 0] col_ind = indices[:, 1]

如果字典中有很多元素，我不確定哪種方法是解決此問題的最快方法。 謝謝！

Answer 1

在python中進行優化的第一條經驗法則是，確保最內層的循環外包給某些庫函數。 這僅適用於cpython-pypy是一個完全不同的故事。 在您的情況下，使用extend可以大大提高速度。

import time
l = range(10000)
x = dict([(k, list(l)) for k in range(1000)])

def org(d):
    row_ind = [k for k, v in d.items() for _ in range(len(v))]
    col_ind = [i for ids in d.values() for i in ids]

def ext(d):
    row_ind = [k for k, v in d.items() for _ in range(len(v))]
    col_ind = []
    for ids in d.values():
        col_ind.extend(ids)

def ext_both(d):
    row_ind = []
    for k, v in d.items():
        row_ind.extend([k] * len(v))
    col_ind = []
    for ids in d.values():
        col_ind.extend(ids)

functions = [org, ext, ext_both]
for func in functions:
    begin = time.time()
    func(x)
    elapsed = time.time() - begin
    print(func.__name__ + ": "  + str(elapsed))

使用python2時的輸出：

org: 0.512559890747
ext: 0.340406894684
ext_both: 0.149670124054

Answer 2

您可以更改基准的輸入大小：

import time
l = xrange(10000)
x = dict([(k, list(l)) for k in xrange(1000)])


def f(d):
    row_ind = [k for k, v in d.iteritems() for _ in range(len(v))]
    col_ind = [i for ids in d.values() for i in ids]


def ff(d):
    import pandas as pd
    df = pd.DataFrame.from_dict(d, orient='index')
    df = df.stack().reset_index()
    row_ind = list(df['level_0'])
    col_ind = list(df[0])


def fff(d):
    import itertools
    import numpy as np
    indices = [(x, y) for x, y in itertools.chain.from_iterable(
        [itertools.product((k,), v) for k, v in d.items()])]
    indices = np.array(indices)
    row_ind = indices[:, 0]
    col_ind = indices[:, 1]

alternatives = [f, ff, fff]
for func in alternatives:
    begin = time.time()
    func(x)
    print time.time() - begin

輸出：

0.977538108826
5.26920008659
6.98472499847

在當前樣本量的情況下，第一種方法似乎更好。 但是，如果您有更多時間選擇樣本大小並等待執行完成，則結果可能會有所不同。 最好使用庫。

Answer 3

有一個稱為裝飾器的功能。 裝飾器始終位於def或class函數之上。 在代碼中使用import timer @timer.Timer()或類似的東西。 您可以使用Google更多。 或轉到此鏈接： https : //wiki.python.org/moin/PythonDecorators

將列表字典轉換為鍵和值對列表的有效方法

問題描述

3 個解決方案

解決方案1
2 已采納 2016-06-16 19:41:27

解決方案2
0 2016-06-16 18:56:39

解決方案3
-2 2016-08-10 17:44:45

將列表字典轉換為鍵和值對列表的有效方法

問題描述

3 個解決方案

解決方案1 2 已采納 2016-06-16 19:41:27

解決方案2 0 2016-06-16 18:56:39

解決方案3 -2 2016-08-10 17:44:45

解決方案1
2 已采納 2016-06-16 19:41:27

解決方案2
0 2016-06-16 18:56:39

解決方案3
-2 2016-08-10 17:44:45