简体   繁体   English

切割 pandas 时间序列的最快方法

[英]Fastest way to cut a pandas time-series

Looking for the fastest way to cut a timeseries... for example just taking the values that are more recent than a certain index.寻找切割时间序列的最快方法……例如,只取比某个索引更新的值。

I've found two commonly used methods:我找到了两种常用的方法:

df = original_series.truncate(before=example_time)

and

df = original_series[example_time:]

Which one is faster (for large time-series > 10**6 values)?哪个更快(对于较大的时间序列 > 10**6 值)?

This usually depends on what your dataframe index is, throwing a random DataFrame of 10^7 values into timeit we get the following.这通常取决于您的 dataframe 索引是什么,将 10^7 值的随机 DataFrame 扔到 timeit 中,我们得到以下结果。

From a performance standpoint in truncation more inefficient as pandas is optimized for integer based indexing via numpy.从性能的角度来看,截断效率更低,因为 pandas 通过 numpy 针对基于 integer 的索引进行了优化。

Truncate: 
62.6 ms ± 3.63 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Bracket Indexing:
54.1 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

ILoc:
69.5 µs ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Loc:
92 µs ± 5.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Ix (which is deprecated):
110 µs ± 8.44 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

EDIT This is all on pandas 0.24.2, back in the 0.14-0.18 versions loc performance was much much worse编辑这一切都在 pandas 0.24.2 上,回到 0.14-0.18 版本的 loc 性能要差得多

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM