简体   繁体   English

快速遍历整个python字典的方法

[英]Fast way to loop over whole python dictionary

I have non-ordered data that sometimes I want to analyse by looking at all the entries and some other time I want to pick just one entry. 我有一些无序数据,有时候我想通过查看所有条目来进行分析,而另一些时候,我只想选择一个条目。

p1   x1 x2 x3 x4
p2   x1 x2 x3 x4
p33  x1 x2 x3 x4
p3   x1 x2 x3 x4
p4   x1 x2 x3 x4

Dictionary seems a nice format to store the data, as it is not sorted, and if I want to get p33 , which might be anywhere in the table I can do that by dict["p33"] . Dictionary似乎是一种很好的存储数据的格式,因为它没有排序,如果我想获取p33 ,它可能在表中的任何地方,我都可以通过dict["p33"]来做到这一点。 This lookup will take some time, but I suppose is faster than looping on the whole data to find the line that I want (at least this is the advantage I have been advertised dict should buy me). 该查询将花费一些时间,但是我想比循环遍历整个数据来查找所需的行要快(至少这是我被广告dict买给我的好处)。

If I want to look at the whole data, eg counting how many times x3 is zero, I should loop on all the lines and doing it by a for loop of the type for item in dict.keys(): is too slow. 如果我想查看整个数据,例如计算x3为零的次数,我应该在所有行上循环,并通过for item in dict.keys():类型的for循环来for item in dict.keys():太慢了。 I have the impression that getting the keys and then doing dict[item] make a lot of useless lookup, because for each item it has to find it in the dictionary, whereas for my goal would be good enough to read serially "as if it were a list". 我的印象是,获取键然后执行dict[item]会造成很多无用的查找,因为对于每个项目,它都必须在字典中找到它,而对于我的目标来说,足以连续读取“是列表”。

So I was wondering if there is a faster way to loop on all the entries of the dictionary. 所以我想知道是否有更快的方法可以循环访问字典的所有条目。

Thanks 谢谢

If its possible use numpy/pandas... 如果可能,请使用numpy / pandas ...

For me Python is only for High Level Programming and Low Level is C++... So if possible use existing c++ functions which are in numpy pandas or other libs.. 对我来说,Python仅适用于高级编程,而低级则适用于C ++。因此,如果可能,请使用numpy pandas或其他库中的现有c ++函数。

Check it out... 看看这个...

>>> import numpy as np, pandas as pd
>>> p1 = np.arange(10)
>>> dct = dict(
... p1 = np.arange(10),
... p2 = np.ones(10),
... p3 = np.zeros(10),
... p33 = np.ones(10)*10,
... p4 = np.linspace(0,1,10))
>>>
>>> dct
{'p2': array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]), 'p33': array([ 10.,  10.,  10.,  10.,  10.,
10.,  10.,  10.,  10.,  10.]), 'p1': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), 'p4': array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ]), 'p3': array([ 0.,  0.,  0.,  0.,  0.,
  0.,  0.,  0.,  0.,  0.])}
>>> from pprint import pprint as pr
>>> pr(dct)
{'p1': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 'p2': array([ 1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.,  1.]),
 'p3': array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]),
 'p33': array([ 10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.]),
 'p4': array([ 0.        ,  0.11111111,  0.22222222,  0.33333333,  0.44444444,
        0.55555556,  0.66666667,  0.77777778,  0.88888889,  1.        ])}
>>> df = pd.DataFrame(dct)
>>> df
   p1   p2   p3   p33        p4
0   0  1.0  0.0  10.0  0.000000
1   1  1.0  0.0  10.0  0.111111
2   2  1.0  0.0  10.0  0.222222
3   3  1.0  0.0  10.0  0.333333
4   4  1.0  0.0  10.0  0.444444
5   5  1.0  0.0  10.0  0.555556
6   6  1.0  0.0  10.0  0.666667
7   7  1.0  0.0  10.0  0.777778
8   8  1.0  0.0  10.0  0.888889
9   9  1.0  0.0  10.0  1.000000
>>> df.T
        0          1          2          3          4          5          6  \
p1    0.0   1.000000   2.000000   3.000000   4.000000   5.000000   6.000000
p2    1.0   1.000000   1.000000   1.000000   1.000000   1.000000   1.000000
p3    0.0   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
p33  10.0  10.000000  10.000000  10.000000  10.000000  10.000000  10.000000
p4    0.0   0.111111   0.222222   0.333333   0.444444   0.555556   0.666667

             7          8     9
p1    7.000000   8.000000   9.0
p2    1.000000   1.000000   1.0
p3    0.000000   0.000000   0.0
p33  10.000000  10.000000  10.0
p4    0.777778   0.888889   1.0
>>> df = df.T
>>> df.columns = ['x%d'%(n+1) for n in df.columns.values]
>>> df
       x1         x2         x3         x4         x5         x6         x7  \
p1    0.0   1.000000   2.000000   3.000000   4.000000   5.000000   6.000000
p2    1.0   1.000000   1.000000   1.000000   1.000000   1.000000   1.000000
p3    0.0   0.000000   0.000000   0.000000   0.000000   0.000000   0.000000
p33  10.0  10.000000  10.000000  10.000000  10.000000  10.000000  10.000000
p4    0.0   0.111111   0.222222   0.333333   0.444444   0.555556   0.666667

            x8         x9   x10
p1    7.000000   8.000000   9.0
p2    1.000000   1.000000   1.0
p3    0.000000   0.000000   0.0
p33  10.000000  10.000000  10.0
p4    0.777778   0.888889   1.0
>>> df.x3
p1      2.000000
p2      1.000000
p3      0.000000
p33    10.000000
p4      0.222222
Name: x3, dtype: float64
>>> df.x3 == 0
p1     False
p2     False
p3      True
p33    False
p4     False
Name: x3, dtype: bool
>>> np.sum(df.x3 == 0)
1
>>>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM