What is an efficient way to get the following pandas DataFrame? (Update: numbers change each time)
alpha beta gamma
0 1.5 2.5 3.5
[1 rows x 3 columns]
I added a pandas.DataFrame API to some of my methods be able to do calculations in batches.
When replicating some of my testcases for the new API the execution of my testbenches raised from 200ms to over 8 seconds. Doing a profile run, I noticed that the main cause is creating 20k pandas.DataFrame
objects.
See the comparison
In [1]: import pandas as pd
In [2]: timeit pd.DataFrame({'alpha': 1.5, 'beta': 2.5, 'gamma': 3.5}, [0])
1000 loops, best of 3: 405 us per loop
In [3]: timeit {'alpha': 1.5, 'beta': 2.5, 'gamma': 3.5}
1000000 loops, best of 3: 200 ns per loop
It seems that creating a DataFrame object is 2000 times slower than lower level structures. I tried to optimize it, but this is as fast as I got:
In [4]: import numpy as np
In [5]: timeit pd.DataFrame(np.array([[1.5, 2.5, 3.5]]), columns=['alpha', 'beta', 'gamma'])
1000 loops, best of 3: 144 us per loop
This is still 720 times slower than the dict. Is it possible to be faster? Creating numpy arrays is eg only 10 times slower:
In [6]: timeit np.array([[1.5, 2.5, 3.5]])
100000 loops, best of 3: 1.99 us per loop
You could have a global data frame for your tests and just do df = global_df.copy()
, example:
In[1] global_df = pd.DataFrame({'alpha': 1.5, 'beta': 2.5, 'gamma': 3.5}, [0])
In[2] timeit global_df.copy()
10000 loops, best of 3: 20.2 us per loop
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.