切割 pandas 时间序列的最快方法

Question

Looking for the fastest way to cut a timeseries... for example just taking the values that are more recent than a certain index.寻找切割时间序列的最快方法……例如，只取比某个索引更新的值。

I've found two commonly used methods:我找到了两种常用的方法：

df = original_series.truncate(before=example_time)

and和

df = original_series[example_time:]

Which one is faster (for large time-series > 10**6 values)?哪个更快（对于较大的时间序列 > 10**6 值）？

Answer 1

This usually depends on what your dataframe index is, throwing a random DataFrame of 10^7 values into timeit we get the following.这通常取决于您的 dataframe 索引是什么，将 10^7 值的随机 DataFrame 扔到 timeit 中，我们得到以下结果。

From a performance standpoint in truncation more inefficient as pandas is optimized for integer based indexing via numpy.从性能的角度来看，截断效率更低，因为 pandas 通过 numpy 针对基于 integer 的索引进行了优化。

Truncate: 
62.6 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Bracket Indexing:
54.1 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

ILoc:
69.5 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Loc:
92 µs ± 5.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Ix (which is deprecated):
110 µs ± 8.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

EDIT This is all on pandas 0.24.2, back in the 0.14-0.18 versions loc performance was much much worse编辑这一切都在 pandas 0.24.2 上，回到 0.14-0.18 版本的 loc 性能要差得多

切割 pandas 时间序列的最快方法

问题描述

1 个解决方案

解决方案1
0 2019-10-22 17:55:13

切割 pandas 时间序列的最快方法

问题描述

1 个解决方案

解决方案1 0 2019-10-22 17:55:13

解决方案1
0 2019-10-22 17:55:13