简体   繁体   English

熊猫分位数功能很慢

[英]Pandas quantile function very slow

I want to calculate quantiles/percentiles on a Pandas Dataframe. 我想计算Pandas Dataframe上的分位数/百分位数。 However, the function is extremely slow. 但是,功能非常慢。 I repeated it with Numpy and I found that calculating it in Pandas takes almost 10 000 times longer! 我用Numpy重复了一遍,我发现在Pandas中计算它需要花费近10 000倍!

Does anybody know why this is the case? 有人知道为什么会这样吗? Should I rather calculate it using Numpy and then create a new DataFrame instead of using Pandas? 我应该使用Numpy计算它,然后创建一个新的DataFrame而不是使用Pandas?

See my code below: 请参阅下面的代码:

import time
import pandas as pd
import numpy as np

q = np.array([0.1,0.4,0.6,0.9])
data = np.random.randn(10000, 4)
df = pd.DataFrame(data, columns=['a', 'b', 'c', 'd'])
time1 = time.time()
pandas_quantiles = df.quantile(q, axis=1)
time2 = time.time()
print 'Pandas took %0.3f ms' % ((time2-time1)*1000.0)

time1 = time.time()
numpy_quantiles = np.percentile(data, q*100, axis=1)
time2 = time.time()
print 'Numpy took %0.3f ms' % ((time2-time1)*1000.0)

print (pandas_quantiles.values == numpy_quantiles).all()
# Output:
# Pandas took 15337.531 ms
# Numpy took 1.653 ms
# True

This issue is solved for recent versions of Pandas with python 3. Pandas is less than two times longer on small arrays, and I get a 5% difference on larger arrays. 最近版本的Pandas与python 3解决了这个问题。在小型阵列上,Pandas的长度不到两倍,而在较大的阵列上我得到5%的差异。

I get the following output with pandas 0.24.1 and Python 3: 我用pandas 0.24.1和Python 3得到以下输出:

import time
import pandas as pd
import numpy as np

q = np.array([0.1,0.4,0.6,0.9])
data = np.random.randn(10000, 4)
df = pd.DataFrame(data, columns=['a', 'b', 'c', 'd'])
time1 = time.time()
pandas_quantiles = df.quantile(q, axis=1)
time2 = time.time()
print 'Pandas took %0.3f ms' % ((time2-time1)*1000.0)

time1 = time.time()
numpy_quantiles = np.percentile(data, q*100, axis=1)
time2 = time.time()
print 'Numpy took %0.3f ms' % ((time2-time1)*1000.0)

print (pandas_quantiles.values == numpy_quantiles).all()
# Output:
# Pandas took 3.415 ms
# Numpy took 2.040 ms
# True

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM