[英]How can I have a stacked plot with a shared X axis and multiple Y axis on one of the plots?
[英]How can I create a Stacked Bar plot in Python where the y axis is NOT based on counts
我有以下Pandas
DataFrame(这里简称):
df = pd.DataFrame([
("Distal Lung AT2", 0.4269588779192778, 20),
("Lung Ciliated epithelial cells", 0.28642167657082035, 20),
("Distal Lung AT2",0.4488207834077291,15),
("Lung Ciliated epithelial cells", 0.27546336897259094, 15),
("Distal Lung AT2", 0.45502553604960105, 10),
("Lung Ciliated epithelial cells", 0.29080413886147555, 10),
("Distal Lung AT2", 0.48481604554028446, 5),
("Lung Ciliated epithelial cells", 0.3178232409599174, 5)],
columns = ["features", "importance", "num_features"])
我想创建一个堆叠条 plot ,其中 x 轴表示num_features
(因此具有相同num_features
的行应该组合在一起),y 轴表示importance
,并且条形 plot 中的每个条都有按features
着色的块
我为此尝试使用plotnine
,如下所示:
plot = (
ggplot(df, aes(x="num_features", y="importance", fill="features"))
+ geom_bar(stat="identity")
+ xlab("Number of Features")
+ ylab("")
)
但是,当我尝试保存 plot 以便查看它ggsave(plot, os.path.join(figure_path, "stacked_feature_importances.png"))
时,我得到:
Traceback (most recent call last):
File "/home/mdanb/plot_top_features_iteratively.py", line 94, in <module>
plot_stacked_bar_plots(backwards_elim_dirs)
File "/home/mdanb/plot_top_features_iteratively.py", line 87, in plot_stacked_bar_plots
ggsave(plot, os.path.join(figure_path, "stacked_feature_importances.png"))
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 736, in ggsave
return plot.save(*arg, **kwargs)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 724, in save
fig, p = self.draw(return_ggplot=True)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 203, in draw
self._build()
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 311, in _build
layers.compute_position(layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/layer.py", line 79, in compute_position
l.compute_position(layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/layer.py", line 393, in compute_position
data = self.position.compute_layer(data, params, layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position.py", line 56, in compute_layer
return groupby_apply(data, 'PANEL', fn)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/utils.py", line 638, in groupby_apply
lst.append(func(d, *args, **kwargs))
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position.py", line 54, in fn
return cls.compute_panel(pdata, scales, params)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position_stack.py", line 85, in compute_panel
trans = scales.y.trans
AttributeError: 'scale_y_discrete' object has no attribute 'trans'
根据这篇文章,我还研究了直接使用Pandas
而不使用plotnine
。 但是,它并没有完全解决我的问题,因为条形 plot 是根据计数堆叠的,而我特别想根据列的值堆叠它( importance
)
问题是您正在使用geom_bar
,它不期望y
美学,它会根据您指定的x
美学自动为您计算计数。
如果要手动指定y
,则应使用geom_col
,它需要x
和y
美学。 如果您包含fill
美学,则默认行为将是堆叠列,您可以通过指定position='dodge'
进行更改。
使用您的示例:
import plotnine as p9
(p9.ggplot(df)
+ p9.aes(x='num_features', y='importance', fill='features')
+ p9.geom_col())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.