简体   繁体   English

为什么带有numexpr的Pandas.eval()这么慢?

[英]Why is Pandas.eval() with numexpr so slow?

Test code: 测试代码:

import numpy as np
import pandas as pd

COUNT = 1000000

df = pd.DataFrame({
    'y': np.random.normal(0, 1, COUNT),
    'z': np.random.gamma(50, 1, COUNT),
})

%timeit df.y[(10 < df.z) & (df.z < 50)].mean()
%timeit df.y.values[(10 < df.z.values) & (df.z.values < 50)].mean()
%timeit df.eval('y[(10 < z) & (z < 50)].mean()', engine='numexpr')

The output on my machine (a fairly fast x86-64 Linux desktop with Python 3.6) is: 我的机器上的输出(使用Python 3.6的相当快的x86-64 Linux桌面)是:

17.8 ms ±  1.3 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
8.44 ms ±  502 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
46.4 ms ± 2.22 ms per loop (mean ± std. dev. of 7 runs,  10 loops each)

I understand why the second line is a bit faster (it ignores the Pandas index). 我理解为什么第二行有点快(它忽略了Pandas索引)。 But why is the eval() approach using numexpr so slow? 但是为什么使用numexpreval()方法如此之慢? Shouldn't it be faster than at least the first approach? 它不应该比至少第一种方法更快吗? The documentation sure makes it seem like it would be: https://pandas.pydata.org/pandas-docs/stable/enhancingperf.html 文档肯定会使它看起来像: https//pandas.pydata.org/pandas-docs/stable/enhancingperf.html

From the investigation presented below, it looks like the unspectacular reason for the worse performance is "overhead". 从下面的调查结果来看,表现较差的不明显原因是“开销”。

Only a small part of the expression y[(10 < z) & (z < 50)].mean() is done via numexpr -module. 只有表达式y[(10 < z) & (z < 50)].mean()一小部分是通过numexpr -module完成的。 numexpr doesn't support indexing , thus we can only hope for (10 < z) & (z < 50) to be speed-up - anything else will be mapped to pandas -operations. numexpr 不支持索引 ,因此我们只能希望(10 < z) & (z < 50)加速 - 其他任何东西都将映射到pandas -operations。

However, (10 < z) & (z < 50) is not the bottle-neck here, as can be easily seen: 然而, (10 < z) & (z < 50)不是瓶颈,可以很容易地看出:

%timeit df.y[(10 < df.z) & (df.z < 50)].mean()  # 16.7 ms
mask=(10 < df.z) & (df.z < 50)                  
%timeit df.y[mask].mean()                       # 13.7 ms
%timeit df.y[mask]                              # 13.2 ms

df.y[mask] -takes the lion's share of the running time. df.y[mask]占据了大部分的运行时间。

We can compare the profiler output for df.y[mask] and df.eval('y[mask]') to see what makes the difference. 我们可以比较df.y[mask]df.eval('y[mask]')的探查器输出,看看有什么区别。

When I use the following script: 当我使用以下脚本时:

import numpy as np
import pandas as pd

COUNT = 1000000

df = pd.DataFrame({
    'y': np.random.normal(0, 1, COUNT),
    'z': np.random.gamma(50, 1, COUNT),
})

mask = (10 < df.z) & (df.z < 50)
df['m']=mask

for _ in range(500):
   df.y[df.m] 
   # OR 
   #df.eval('y[m]', engine='numexpr')

and run it with python -m cProfile -s cumulative run.py (or %prun -s cumulative <...> in IPython), I can see the following profiles. 并使用python -m cProfile -s cumulative run.py (或IPython中的%prun -s cumulative <...> )运行它,我可以看到以下配置文件。

For direct call of the pandas functionality: 直接调用pandas功能:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    419/1    0.013    0.000    7.228    7.228 {built-in method builtins.exec}
        1    0.006    0.006    7.228    7.228 run.py:1(<module>)
      500    0.005    0.000    6.589    0.013 series.py:764(__getitem__)
      500    0.003    0.000    6.475    0.013 series.py:812(_get_with)
      500    0.003    0.000    6.468    0.013 series.py:875(_get_values)
      500    0.009    0.000    6.445    0.013 internals.py:4702(get_slice)
      500    0.006    0.000    3.246    0.006 range.py:491(__getitem__)
      505    3.146    0.006    3.236    0.006 base.py:2067(__getitem__)
      500    3.170    0.006    3.170    0.006 internals.py:310(_slice)
    635/2    0.003    0.000    0.414    0.207 <frozen importlib._bootstrap>:958(_find_and_load)

We can see that almost 100% of the time is spent in series.__getitem__ without any overhead. 我们可以看到,几乎100%的时间是在series.__getitem__花费的series.__getitem__没有任何开销。

For the call via df.eval(...) , the situation is quite different: 对于通过df.eval(...)的调用,情况完全不同:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    453/1    0.013    0.000   12.702   12.702 {built-in method builtins.exec}
        1    0.015    0.015   12.702   12.702 run.py:1(<module>)
      500    0.013    0.000   12.090    0.024 frame.py:2861(eval)
 1000/500    0.025    0.000   10.319    0.021 eval.py:153(eval)
 1000/500    0.007    0.000    9.247    0.018 expr.py:731(__init__)
 1000/500    0.004    0.000    9.236    0.018 expr.py:754(parse)
 4500/500    0.019    0.000    9.233    0.018 expr.py:307(visit)
 1000/500    0.003    0.000    9.105    0.018 expr.py:323(visit_Module)
 1000/500    0.002    0.000    9.102    0.018 expr.py:329(visit_Expr)
      500    0.011    0.000    9.096    0.018 expr.py:461(visit_Subscript)
      500    0.007    0.000    6.874    0.014 series.py:764(__getitem__)
      500    0.003    0.000    6.748    0.013 series.py:812(_get_with)
      500    0.004    0.000    6.742    0.013 series.py:875(_get_values)
      500    0.009    0.000    6.717    0.013 internals.py:4702(get_slice)
      500    0.006    0.000    3.404    0.007 range.py:491(__getitem__)
      506    3.289    0.007    3.391    0.007 base.py:2067(__getitem__)
      500    3.282    0.007    3.282    0.007 internals.py:310(_slice)
      500    0.003    0.000    1.730    0.003 generic.py:432(_get_index_resolvers)
     1000    0.014    0.000    1.725    0.002 generic.py:402(_get_axis_resolvers)
     2000    0.018    0.000    1.685    0.001 base.py:1179(to_series)
     1000    0.003    0.000    1.537    0.002 scope.py:21(_ensure_scope)
     1000    0.014    0.000    1.534    0.002 scope.py:102(__init__)
      500    0.005    0.000    1.476    0.003 scope.py:242(update)
      500    0.002    0.000    1.451    0.003 inspect.py:1489(stack)
      500    0.021    0.000    1.449    0.003 inspect.py:1461(getouterframes)
    11000    0.062    0.000    1.415    0.000 inspect.py:1422(getframeinfo)
     2000    0.008    0.000    1.276    0.001 base.py:1253(_to_embed)
     2035    1.261    0.001    1.261    0.001 {method 'copy' of 'numpy.ndarray' objects}
     1000    0.015    0.000    1.226    0.001 engines.py:61(evaluate)
    11000    0.081    0.000    1.081    0.000 inspect.py:757(findsource)

once again about 7 seconds are spent in series.__getitem__ , but there are also about 6 seconds overhead - for example about 2 seconds in frame.py:2861(eval) and about 2 seconds in expr.py:461(visit_Subscript) . series.__getitem__再次花费大约7秒series.__getitem__ ,但也有大约6秒的开销 - 例如在frame.py:2861(eval)大约2秒,在expr.py:461(visit_Subscript)大约2秒。

I did only a superficial investigation (see more details further below), but this overhead doesn't seems to be just constant but at least linear in the number of element in the series. 我只做了一个肤浅的调查(详见下面的更多详细信息),但这个开销似乎不仅仅是常数,而是至少在系列中的元素数量上是线性的。 For example there is method 'copy' of 'numpy.ndarray' objects which means that data is copied (it is quite unclear, why this would be necessary per se). 例如,有method 'copy' of 'numpy.ndarray' objects这意味着数据被复制(很不清楚,为什么这本身就是必要的)。

My take-away from it: using pd.eval has advantages as long as the evaluated expression can be evaluated with numexpr alone. 我从中获取:使用pd.eval具有优势,只要评估的表达式可以仅使用numexpr进行评估。 As soon as this is not the case, there might be no longer gains but losses due to quite large overhead. 一旦情况不是这样,由于相当大的开销,可能不会再有收益而是损失。


Using line_profiler (here I use %lprun-magic (after loading it with %load_ext line_profliler ) for the function run() which is more or less a copy from the script above) we can easily find where the time is lost in Frame.eval : 使用line_profiler (这里我使用%lprun-magic(在用%load_ext line_profliler加载之后)为函数run()这或多或少是上面脚本的副本)我们可以很容易地找到Frame.eval丢失时间的Frame.eval

%lprun -f pd.core.frame.DataFrame.eval
       -f pd.core.frame.DataFrame._get_index_resolvers 
       -f pd.core.frame.DataFrame._get_axis_resolvers  
       -f pd.core.indexes.base.Index.to_series 
       -f pd.core.indexes.base.Index._to_embed
       run()

Here we can see were the additional 10% are spent: 在这里,我们可以看到额外的10%花费:

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  2861                                               def eval(self, expr, 
....
  2951        10        206.0     20.6      0.0          from pandas.core.computation.eval import eval as _eval
  2952                                           
  2953        10        176.0     17.6      0.0          inplace = validate_bool_kwarg(inplace, 'inplace')
  2954        10         30.0      3.0      0.0          resolvers = kwargs.pop('resolvers', None)
  2955        10         37.0      3.7      0.0          kwargs['level'] = kwargs.pop('level', 0) + 1
  2956        10         17.0      1.7      0.0          if resolvers is None:
  2957        10     235850.0  23585.0      9.0              index_resolvers = self._get_index_resolvers()
  2958        10       2231.0    223.1      0.1              resolvers = dict(self.iteritems()), index_resolvers
  2959        10         29.0      2.9      0.0          if 'target' not in kwargs:
  2960        10         19.0      1.9      0.0              kwargs['target'] = self
  2961        10         46.0      4.6      0.0          kwargs['resolvers'] = kwargs.get('resolvers', ()) + tuple(resolvers)
  2962        10    2392725.0 239272.5     90.9          return _eval(expr, inplace=inplace, **kwargs)

and _get_index_resolvers() can be drilled down to Index._to_embed : _get_index_resolvers()可以深入到Index._to_embed

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
  1253                                               def _to_embed(self, keep_tz=False, dtype=None):
  1254                                                   """
  1255                                                   *this is an internal non-public method*
  1256                                           
  1257                                                   return an array repr of this object, potentially casting to object
  1258                                           
  1259                                                   """
  1260        40         73.0      1.8      0.0          if dtype is not None:
  1261                                                       return self.astype(dtype)._to_embed(keep_tz=keep_tz)
  1262                                           
  1263        40     201490.0   5037.2    100.0          return self.values.copy()

Where the O(n) -copying happens. O(n) - 复制发生的地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM