[英]How can I create a Stacked Bar plot in Python where the y axis is NOT based on counts
I have the following Pandas
DataFrame (abbreviated here):我有以下
Pandas
DataFrame(这里简称):
df = pd.DataFrame([
("Distal Lung AT2", 0.4269588779192778, 20),
("Lung Ciliated epithelial cells", 0.28642167657082035, 20),
("Distal Lung AT2",0.4488207834077291,15),
("Lung Ciliated epithelial cells", 0.27546336897259094, 15),
("Distal Lung AT2", 0.45502553604960105, 10),
("Lung Ciliated epithelial cells", 0.29080413886147555, 10),
("Distal Lung AT2", 0.48481604554028446, 5),
("Lung Ciliated epithelial cells", 0.3178232409599174, 5)],
columns = ["features", "importance", "num_features"])
I'd like to create a stacked bar plot where the x-axis represents the num_features
(so rows with the same num_features
should be grouped together), the y axis represents importance
, and each bar in the bar plot has blocks colored by features
我想创建一个堆叠条 plot ,其中 x 轴表示
num_features
(因此具有相同num_features
的行应该组合在一起),y 轴表示importance
,并且条形 plot 中的每个条都有按features
着色的块
I tried using plotnine
for this, as follows:我为此尝试使用
plotnine
,如下所示:
plot = (
ggplot(df, aes(x="num_features", y="importance", fill="features"))
+ geom_bar(stat="identity")
+ xlab("Number of Features")
+ ylab("")
)
However, when I try to save the plot so I can view it ggsave(plot, os.path.join(figure_path, "stacked_feature_importances.png"))
, I get:但是,当我尝试保存 plot 以便查看它
ggsave(plot, os.path.join(figure_path, "stacked_feature_importances.png"))
时,我得到:
Traceback (most recent call last):
File "/home/mdanb/plot_top_features_iteratively.py", line 94, in <module>
plot_stacked_bar_plots(backwards_elim_dirs)
File "/home/mdanb/plot_top_features_iteratively.py", line 87, in plot_stacked_bar_plots
ggsave(plot, os.path.join(figure_path, "stacked_feature_importances.png"))
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 736, in ggsave
return plot.save(*arg, **kwargs)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 724, in save
fig, p = self.draw(return_ggplot=True)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 203, in draw
self._build()
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/ggplot.py", line 311, in _build
layers.compute_position(layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/layer.py", line 79, in compute_position
l.compute_position(layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/layer.py", line 393, in compute_position
data = self.position.compute_layer(data, params, layout)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position.py", line 56, in compute_layer
return groupby_apply(data, 'PANEL', fn)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/utils.py", line 638, in groupby_apply
lst.append(func(d, *args, **kwargs))
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position.py", line 54, in fn
return cls.compute_panel(pdata, scales, params)
File "/home/mdanb/.local/lib/python3.8/site-packages/plotnine/positions/position_stack.py", line 85, in compute_panel
trans = scales.y.trans
AttributeError: 'scale_y_discrete' object has no attribute 'trans'
I also looked into trying directly to use Pandas
without plotnine
, based on this post.根据这篇文章,我还研究了直接使用
Pandas
而不使用plotnine
。 However, it doesn't quite address my issue because the bar plot is stacked based on counts, whereas I specifically want to stack it based on values of a column ( importance
)但是,它并没有完全解决我的问题,因为条形 plot 是根据计数堆叠的,而我特别想根据列的值堆叠它(
importance
)
The problem is you are using geom_bar
, which doesn't expect a y
aesthetic, it automatically computes the counts for you based on the x
aesthetic you specify.问题是您正在使用
geom_bar
,它不期望y
美学,它会根据您指定的x
美学自动为您计算计数。
If you want to specify manually the y
, you should use geom_col
, which expects both an x
and y
aesthetic.如果要手动指定
y
,则应使用geom_col
,它需要x
和y
美学。 The default behaviour if you include a fill
aesthetic will be to stack the columns, which you could change by specifying position='dodge'
.如果您包含
fill
美学,则默认行为将是堆叠列,您可以通过指定position='dodge'
进行更改。
Using your example:使用您的示例:
import plotnine as p9
(p9.ggplot(df)
+ p9.aes(x='num_features', y='importance', fill='features')
+ p9.geom_col())
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.