是否有更多pythonic /更有效的方法来循环包含列表的字典而不是使用for循环？

Question

After using get to extract information from an API in JSON format, I'm now attempting to calculate an average of price in an efficient way. 在使用get以JSON格式从API提取信息之后，我现在尝试以有效的方式计算price的平均值。

data (Example response from API Call): data （来自API调用的示例响应）：

...
{u'status': u'success', u'data': {u'context_id': u'2', u'app_id': u'123', u'sales': [{u'sold_at': 133, u'price': u'1.8500', u'hash_name': u'Xuan881', u'value': u'-1.00000'}, {u'sold_at': 139, u'price': u'2.6100', u'hash_name': u'Xuan881', u'value': u'-1.00000'},
... etc.

I have managed to do so with the following code: 我已设法使用以下代码执行此操作：

len_sales = len(data["data"]["sales"])
total_p = 0 
for i in range(0,len_sales):
    total_p += float(data["data"]["sales"][i]["price"])
average = total_p/len_sales
print average

However, since the data dictionary retrieved is large in size, there seems to be quite a bit of waiting time before the output is shown. 但是，由于检索到的data字典很大，因此在显示输出之前似乎有相当多的等待时间。

Therefore, I was wondering whether there is a more efficient and/or pythonic way of achieving the same result, but in a shorter time. 因此，我想知道是否有更高效和/或pythonic的方式来实现相同的结果，但是在更短的时间内。

Answer 1

First, you're not looping through a dict, you're looping through a list that happens to be inside a dict. 首先，你没有循环通过一个字典，你正在循环一个恰好在dict中的列表。

Second, doing something for every value in a list inherently requires visiting every value in the list; 其次，为列表中的每个值执行某些操作本身就需要访问列表中的每个值; there's no way around the linear cost. 没有办法绕线性成本。

So, the only thing available is micro-optimizations, which probably won't make much difference—if your code is way too slow, 10% faster doesn't help, and if your code is already fast enough, you don't need it—but occasionally they are needed. 因此，唯一可用的是微优化，这可能不会有太大的区别 - 如果你的代码太慢，10％的速度没有帮助，如果你的代码已经足够快，你不需要它 - 但有时他们是需要的。

And in this case, almost all of the micro-optimizations also make your code more readable and Pythonic, so there's no good reason not to do them: 在这种情况下，几乎所有的微优化也使你的代码更具可读性和Pythonic，所以没有充分的理由不这样做：

First, you're accessing data["data"]["sales"] twice. 首先，您要访问data["data"]["sales"]两次。 The performance cost of that is probably negligible, but it also makes your code less readable, so let's fix that: 它的性能成本可能是微不足道的，但它也使你的代码可读性降低，所以让我们解决这个问题：

sales = data["data"]["sales"]

Next, instead of looping for i in range(0, len_sales): just to use sales[i] , it's faster—and, again, more readable—to just loop over sales : 接下来，而不是for i in range(0, len_sales):循环for i in range(0, len_sales):只是为了使用sales[i] ，它更快 - 而且，更可读 - 只是循环sales ：

for sale in sales:
    total_p += float(sale["price"])

And now we can turn this loop into a comprehension, which is slightly more efficient (although that's partly canceled by the cost of adding a generator—you might actually want to test this one): 现在我们可以将这个循环变成一个理解，这稍微有点效率（虽然这部分取消了添加生成器的成本 - 你可能真的想测试这个）：

prices = (float(sale["price"]) for sale in sales)

… and pass that directly to sum : ...并将其直接传递给sum ：

total_p = sum(float(sale["price"]) for sale in sales)

We can also use the mean function that comes with Python instead of doing it manually: 我们也可以使用Python附带的mean函数而不是手动执行：

average = statistics.mean(float(sale["price"]) for sale in sales)

… except that you're apparently using Python 2, so you'd need to install the unofficial backport off PyPI (the official stats backport only goes back to 3.1; the 2.x version was abandoned), so let's skip that part. ...除了你显然使用Python 2，所以你需要安装PyPI的非官方backport （官方stats backport只返回3.1; 2.x版本被放弃），所以让我们跳过那部分。

Putting it all together: 把它们放在一起：

sales = data["data"]["sales"]
total = sum(float(sale["price"]) for sale in sales)
average = total / len(sales)

A couple things that might help—if it matters, you definitely are going to want to test with timeit : 一对夫妇的事情， 可能会帮助，如果它很重要，你一定会想与测试timeit ：

You can use operator.itemgetter to get the price item. 您可以使用operator.itemgetter获取price项。 Which means your expression is now just chaining two function calls, which means you can chain two map calls: 这意味着您的表达式现在只链接两个函数调用，这意味着您可以链接两个map调用：

total = sum(map(float, map(operator.itemgetter("price"), sales)))

That's probably less readable than the comprehension to anyone who isn't coming from a Lisp background, but it's certainly not terrible, and it might be a little faster. 对于那些不是来自Lisp背景的人而言，这可能不如对任何人的理解那么可读，但它肯定不是很糟糕，而且可能会快一些。

Alternatively, for moderately-sized input, building a temporary list is sometimes worth it. 或者，对于中等大小的输入，建立临时列表有时是值得的。 Sure, you waste time allocating memory and copying data around, but iterating a list is faster than iterating a generator, so the only way to really be sure is to test. 当然，你浪费时间分配内存和复制数据，但迭代列表比迭代生成器更快，所以唯一可靠的方法是测试。

One more thing that might make a difference is to move this whole thing into a function. 可能有所作为的另一件事是将整个事物转变为一个功能。 Code at the top level doesn't have local variables, only global, and they're slower to look up. 顶级代码没有局部变量，只有全局变量，而且查找速度较慢。

If you really need to squeeze out the last few percentage points, it's sometimes even worth copying global and builtin functions like float into locals. 如果你真的需要挤出最后几个百分点，有时甚至值得复制全局和内置函数，比如float到本地。 Of course that isn't going to help with map (since we're only accessing them once), but with a comprehension it might, so I'll show how to do it anyway: 当然这对map没有帮助（因为我们只访问过一次），但是理解它可能，所以我将展示如何做到这一点：

def total_price(sales):
    _float = float
    pricegetter = operator.itemgetter("price")
    return sum(map(_float, map(pricegetter, sales)))

The best way to benchmark code is to use the timeit module—or, if you're using IPython, the %timeit magic. 对代码进行基准测试的最佳方法是使用timeit模块 - 或者，如果您正在使用IPython，则使用%timeit magic。 Which works like this: 其工作方式如下：

In [3]: %%timeit
... total_p = 0 
... for i in range(0,len_sales):
...     total_p += float(data["data"]["sales"][i]["price"])
10000 loops, best of 3: 28.4 µs per loop
In [4]: %timeit sum(float(sale["price"]) for sale in sales)
10000 loops, best of 3: 18.4 µs per loop
In [5]: %timeit sum(map(float, map(operator.itemgetter("price"), sales)))
100000 loops, best of 3: 16.9 µs per loop
In [6]: %timeit sum([float(sale["price"]) for sale in sales])
100000 loops, best of 3: 18.2 µs per loop
In [7]: %timeit total_price(sales)
100000 loops, best of 3: 17.2 µs per loop

So, on my laptop, with your sample data: 因此，在我的笔记本电脑上，您的样本数据：

Looping directly over sales and using a generator expression instead of a statement is about 35% faster. 直接在sales循环并使用生成器表达式而不是语句大约快35％。
Using a list comprehension instead of a genexpr is about 1% faster than that. 使用列表推导而不是genexpr比这快约1％。
Using map and itemgetter instead of a genexpr is about 10% faster. 使用map和itemgetter而不是genexpr大约快10％。
Wrapping it in a function and caching the locals made things slightly slower. 将它包装在函数中并缓存本地文件会使事情稍微变慢。 (Not surprising—as mentioned above, we only had a single lookup for each name anyway, thanks to map , so we're just adding a tiny overhead for probably 0 benefit.) （这并不奇怪 - 如上所述，由于map原因，我们只对每个名称进行了一次查找，因此我们只需添加一小笔开销即可获得0个优惠。）

Overall, sum(map(…map(…))) turned out to be the fasted for this particular input, on my laptop. 总的来说， sum(map(…map(…)))在我的笔记本电脑上被证明是这个特定输入的禁食。

But of course you'll want to repeat this test on your real environment with your real input. 但是你当然希望用真正的输入在你的真实环境中重复这个测试。 When differences as small as 10% matter, you can't just assume that the details will transfer. 当小到10％的差异很重要时，您不能只假设细节会转移。

One more thing: If you really need to speed things up, often the simplest thing to do is take the exact same code and run it in PyPy instead of the usual CPython interpreter. 还有一件事：如果你真的需要加快速度，通常最简单的方法是使用完全相同的代码并在PyPy中运行它而不是通常的CPython解释器。 Repeating some of the above tests: 重复上述一些测试：

In [4]: %timeit sum(float(sale["price"]) for sale in sales)
680 ns ± 19.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [5]: %timeit sum(map(float, map(operator.itemgetter("price"), sales)))
800 ns ± 24.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
In [6]: %timeit sum([float(sale["price"]) for sale in sales])
694 ns ± 24.4 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Now the generator expression version is the fastest—but, more importantly, all three versions are roughly 20x as fast as they were in CPython. 现在生成器表达式版本是最快的 - 但更重要的是，所有三个版本的速度大约是CPython的20倍。 A 2000% improvement is a lot better than a 35% improvement. 2000％的改善比35％的改善好很多。

Answer 2

You can use a library called statistics and find the mean of the list of sales. 您可以使用名为statistics的库并查找销售列表的平均值。 To get the list of sales, you could do a list comprehension - 要获得销售清单，您可以进行清单理解 -

prices = [float(v) for k, v in i.iteritems() for i in data["data"]["sales"] if k == "price"]

This will give you a list of prices. 这将为您提供价格清单。 Now all you need to do with above library is 现在你需要做的就是上面的库了

mean(prices)

Or, you could just do something like - 或者，你可以做一些像 -

mean_price = sum(prices) / len(prices)

And you will have the average of prices. 你会得到平均价格。 Using list comprehension, you have already optimised your code. 使用列表推导，您已经优化了代码。 See this and read the last paragraph of the answer too 请参阅此内容并阅读答案的最后一段

是否有更多pythonic /更有效的方法来循环包含列表的字典而不是使用for循环？

问题描述

2 个解决方案

解决方案1
7 已采纳 2018-07-21 19:57:14

解决方案2
1 2018-07-21 20:05:03

是否有更多pythonic /更有效的方法来循环包含列表的字典而不是使用for循环？

问题描述

2 个解决方案

解决方案1 7 已采纳 2018-07-21 19:57:14

解决方案2 1 2018-07-21 20:05:03

解决方案1
7 已采纳 2018-07-21 19:57:14

解决方案2
1 2018-07-21 20:05:03