[英]Why is bokeh so much slower than matplotlib
我在 Bokeh 中绘制了一个箱线图,在 matplotlib 中绘制了另一个。 对于相同的数据,在 Bokeh 中绘制大约要慢 100 倍。 为什么散景需要这么长时间? 这是代码,我在 Jupyter notebook 中运行了它:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
from bokeh.charts import BoxPlot, output_notebook, show
from time import time
%matplotlib inline
# Generate data
N = 100000
x1 = 2 + np.random.randn(N)
y1 = ['a'] * N
x2 = -2 + np.random.randn(N)
y2 = ['b'] * N
X = list(x1) + list(x2)
Y = y1 + y2
data = pd.DataFrame()
data['Vals'] = X
data['Class'] = Y
df = data.apply(np.random.permutation)
# Time the bokeh plot
start_time = time()
p = BoxPlot(data, values='Vals', label='Class',\
title="MPG Summary (grouped by CYL, ORIGIN)")
output_notebook()
show(p)
end_time = time()
print("Total time taken for Bokeh is {0}".format(end_time - start_time))
# time the matplotlib plot
start_time = time()
data.boxplot(column='Vals', by='Class', sym = 'o')
end_time = time()
print("Total time taken for matplotlib is {0}".format(end_time - start_time))
打印语句产生以下输出:
散景所需的总时间为 11.8056321144104
matplotlib 花费的总时间为 0.1586170196533203
bokeh.charts.BoxPlot
存在一些问题。 不幸的是, bokeh.charts
目前没有维护者,所以我无法说明它何时可能得到修复或改进。
但是,如果它对您有用,我将在下面演示您可以使用完善且稳定的bokeh.plotting
API 来“手动”做事,然后时间与 MPL 相当,如果bokeh.plotting
话:
from time import time
import pandas as pd
import numpy as np
from bokeh.io import output_notebook, show
from bokeh.plotting import figure
output_notebook()
# Generate data
N = 100000
x1 = 2 + np.random.randn(N)
y1 = ['a'] * N
x2 = -2 + np.random.randn(N)
y2 = ['b'] * N
X = list(x1) + list(x2)
Y = y1 + y2
df = pd.DataFrame()
df['Vals'] = X
df['Class'] = Y
# Time the bokeh plot
start_time = time()
# find the quartiles and IQR for each category
groups = df.groupby('Class')
q1 = groups.quantile(q=0.25)
q2 = groups.quantile(q=0.5)
q3 = groups.quantile(q=0.75)
iqr = q3 - q1
upper = q3 + 1.5*iqr
lower = q1 - 1.5*iqr
cats = ['a', 'b']
p = figure(x_range=cats)
# if no outliers, shrink lengths of stems to be no longer than the minimums or maximums
qmin = groups.quantile(q=0.00)
qmax = groups.quantile(q=1.00)
upper.score = [min([x,y]) for (x,y) in zip(list(qmax.loc[:,'Vals']),upper.Vals)]
lower.score = [max([x,y]) for (x,y) in zip(list(qmin.loc[:,'Vals']),lower.Vals)]
# stems
p.segment(cats, upper.Vals, cats, q3.Vals, line_color="black")
p.segment(cats, lower.Vals, cats, q1.Vals, line_color="black")
# boxes
p.vbar(cats, 0.7, q2.Vals, q3.Vals, fill_color="#E08E79", line_color="black")
p.vbar(cats, 0.7, q1.Vals, q2.Vals, fill_color="#3B8686", line_color="black")
# whiskers (almost-0 height rects simpler than segments)
p.rect(cats, lower.Vals, 0.2, 0.01, line_color="black")
p.rect(cats, upper.Vals, 0.2, 0.01, line_color="black")
p.xgrid.grid_line_color = None
p.ygrid.grid_line_color = "white"
p.grid.grid_line_width = 2
p.xaxis.major_label_text_font_size="12pt"
show(p)
end_time = time()
print("Total time taken for Bokeh is {0}".format(end_time - start_time))
这是一段代码,但它很简单,可以包装成一个可重用的函数。 对我来说,上述结果是:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.