[英]Fastest way to cut a pandas time-series
Looking for the fastest way to cut a timeseries... for example just taking the values that are more recent than a certain index.寻找切割时间序列的最快方法……例如,只取比某个索引更新的值。
I've found two commonly used methods:我找到了两种常用的方法:
df = original_series.truncate(before=example_time)
and和
df = original_series[example_time:]
Which one is faster (for large time-series > 10**6 values)?哪个更快(对于较大的时间序列 > 10**6 值)?
This usually depends on what your dataframe index is, throwing a random DataFrame of 10^7 values into timeit we get the following.这通常取决于您的 dataframe 索引是什么,将 10^7 值的随机 DataFrame 扔到 timeit 中,我们得到以下结果。
From a performance standpoint in truncation more inefficient as pandas is optimized for integer based indexing via numpy.从性能的角度来看,截断效率更低,因为 pandas 通过 numpy 针对基于 integer 的索引进行了优化。
Truncate:
62.6 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Bracket Indexing:
54.1 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ILoc:
69.5 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Loc:
92 µs ± 5.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Ix (which is deprecated):
110 µs ± 8.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
EDIT This is all on pandas 0.24.2, back in the 0.14-0.18 versions loc performance was much much worse编辑这一切都在 pandas 0.24.2 上,回到 0.14-0.18 版本的 loc 性能要差得多
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.