简体   繁体   English

快速Pandas.DataFrame初始化

[英]Fast pandas.DataFrame initialization

Question

What is an efficient way to get the following pandas DataFrame? 什么是获取以下pandas DataFrame的有效方法? (Update: numbers change each time) (更新:数字每次都会改变)

   alpha  beta  gamma
0    1.5   2.5    3.5

[1 rows x 3 columns]

Motivation 动机

I added a pandas.DataFrame API to some of my methods be able to do calculations in batches. 我向一些方法中添加了pandas.DataFrame API,以便能够进行批量计算。

When replicating some of my testcases for the new API the execution of my testbenches raised from 200ms to over 8 seconds. 当为新API复制我的一些测试用例时,我的测试平台的执行时间从200毫秒提高到8秒钟以上。 Doing a profile run, I noticed that the main cause is creating 20k pandas.DataFrame objects. 在运行配置文件时,我注意到主要原因是创建了20k pandas.DataFrame对象。

See the comparison 查看比较

In [1]: import pandas as pd

In [2]: timeit pd.DataFrame({'alpha': 1.5, 'beta': 2.5, 'gamma': 3.5}, [0])
1000 loops, best of 3: 405 us per loop

In [3]: timeit {'alpha': 1.5, 'beta': 2.5, 'gamma': 3.5}
1000000 loops, best of 3: 200 ns per loop

It seems that creating a DataFrame object is 2000 times slower than lower level structures. 看来创建DataFrame对象比底层结构慢2000倍。 I tried to optimize it, but this is as fast as I got: 我试图对其进行优化,但这与我得到的速度一样快:

In [4]: import numpy as np

In [5]: timeit pd.DataFrame(np.array([[1.5, 2.5, 3.5]]), columns=['alpha', 'beta', 'gamma'])
1000 loops, best of 3: 144 us per loop

This is still 720 times slower than the dict. 这仍然比该指令慢720倍。 Is it possible to be faster? 有可能更快吗? Creating numpy arrays is eg only 10 times slower: 例如,创建numpy数组仅慢10倍:

In [6]: timeit np.array([[1.5, 2.5, 3.5]])
100000 loops, best of 3: 1.99 us per loop

You could have a global data frame for your tests and just do df = global_df.copy() , example: 您可以为测试提供一个全局数据框,然后执行df = global_df.copy() ,例如:

In[1] global_df = pd.DataFrame({'alpha': 1.5, 'beta': 2.5, 'gamma': 3.5}, [0])
In[2] timeit global_df.copy()
10000 loops, best of 3: 20.2 us per loop

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM